Skip to main content

MAX Engine intro

The MAX platform is a solution to the world's fragmented and complicated AI pipeline tools. MAX brings programmability and performance to your entire AI pipeline, and MAX Engine is the heart of it all.

MAX Engine is a next-generation compiler and runtime system for neural network graphs. It supercharges the execution of AI models in any format (including TensorFlow, PyTorch, and ONNX), on a wide variety of hardware. MAX Engine also allows you to extend these models with custom ops that MAX Engine can analyze and optimize with other ops in the graph.

Get started

What MAX Engine can do

MAX Engine supercharges your AI inference workloads and gives your developer team superpowers.

  • Framework optionality: Compiles and runs your existing AI models to accelerate inferencing.

  • Hardware portability: Delivers high-speed inference on a wide range of hardware, without any code changes.

  • Model extensibility: Allows you to extend your models with custom ops that natively fuse in the graph.

  • Seamless integration: Deploys into production using the tools and services you already know.

Using our Python or C APIs, you can seamlessly upgrade your existing pipeline to run inference with MAX Engine—see how it looks with Python. From that point, you can incrementally adopt other MAX features to optimize your model and improve performance.

You don't need to learn Mojo to use MAX Engine. However, Mojo delivers significant performance improvements for any compute workload, as we've demonstrated in a series of blog posts. Here are some ways you can introduce Mojo into your AI pipeline, one step at a time:

Framework optionality

You can build and train your models in TensorFlow or PyTorch and seamlessly load them for accelerated inference in MAX Engine, using our Python, C, or Mojo API libraries.

There's no model conversion step. MAX Engine can compile most models and run them on a wide range of hardware for immediate performance gains.

Hardware portability

We designed MAX Engine from the ground-up with cutting-edge compiler technologies that enable our platform to scale in any direction and deliver state-of-the-art performance on any hardware.

You can select the best backend for the job without rewriting or recompiling your models. This allows you to take advantage of the breadth and depth of different cloud instances at the best price, and always get the best inference cost-performance ratio from MAX Engine.

Today, MAX Engine is optimized for maximum performance on x86 and ARM CPUs. Support for GPUs is coming soon, and support for other ASICs will come later.

Model extensibility

You can replace graph ops and implement custom ops in Mojo, which MAX Engine can natively optimize, compile, and fuse with the graph in your loaded model. Your ops are treated just like framework ops, and MAX Engine compiles the whole graph into a single optimized executable, instead of calling upon an external op library.

Going a step further, you can also write your entire graph with our Mojo Graph API (currently experimental), which gives you complete control of the graph architecture at a lower level of the MAX Engine.

Seamless integration

Modular integrates with industry-standard infrastructure and open-source tools to minimize migration cost. We offer simple solutions to deploy into production on a cloud platform you know and trust, such as Kubernetes, TF Serving, and NVIDIA Triton. For more information, read about MAX Serving.

How MAX Engine works

MAX Engine is a compiler and runtime for your AI models, available today in the MAX Developer Edition SDK. You can use the MAX Engine libraries to load your models (TensorFlow, PyTorch, or ONNX models) and execute them on a wide range of hardware.

The MAX Engine SDK includes the following:

  • MAX Engine model compiler and runtime (optimizes and executes models)

  • MAX Engine API libraries (API bindings to run inference from Python, C, or Mojo)

  • MAX CLI (utilities to benchmark and visualize your model)

  • MAX Graph API library (APIs to build low-level graphs in Mojo)

Installing MAX Engine also installs Mojo, which is required when writing custom ops or building model graphs.

All of this is available today in the MAX Developer Edition.

In the following setup guide, it’ll take you just a few minutes to install the MAX Engine Developer Edition and use our code examples to run some publicly available, pre-trained models.

Get started