Modular Inference Engine

The world’s fastest unified inference engine, supercharging any model from TensorFlow or PyTorch on a wide range of hardware.

The Modular Inference Engine can help simplify your workflow and reduce your inference latency so you can scale your AI products.

We’ve incorporated best-in-class compiler and runtime technologies to create the world’s fastest unified inference engine. It supercharges all models from TensorFlow and PyTorch, and runs on a wide variety of hardware backends.

Below, you can preview our APIs for Python and C, and our C++ API is coming soon!

Talk to us on Discord

Python API docs

C API docs

Server integration docs