Modular AI Engine

The world’s fastest unified inference engine, supercharging any model from TensorFlow or PyTorch on a wide range of hardware.

The Modular AI Engine can help simplify your workflow and reduce your inference latency so you can scale your AI products.

We’ve incorporated best-in-class compiler and runtime technologies to create the world’s fastest unified inference engine. It supercharges all models from TensorFlow and PyTorch, and runs on a wide variety of hardware backends.

Below, you can preview our APIs for Python and C, and our C++ API is coming soon!

See our performance dashboard

Python API


Server integration