Skip to main content

What is MAX

The Modular Accelerated Xecution (MAX) platform is a unified set of tools and libraries that unlock performance, programmability, and portability for your AI inference pipeline. Each component in MAX is designed to build high-performance AI pipelines and simplify the process of deploying them to any hardware, with the best possible cost-performance ratio.

If that explanation isn't good enough, keep reading and we'll tell you more about what's included in MAX and why you should try it.

What's included

Our goal is to provide everything you need to deploy low-latency, high-throughput, real-time inference pipelines into production. Toward that goal, MAX currently includes the following:

  • MAX Engine
    A state-of-the-art graph compiler and runtime library that executes models from PyTorch, TensorFlow, and ONNX, with incredible inference speed on a wide range of hardware. More about MAX Engine.

  • MAX Serving
    A serving wrapper for MAX Engine that provides full interoperability with existing AI serving systems (such as Triton) and that seamlessly deploys within existing container infrastructure (such as Kubernetes). More about MAX Serving.

  • Mojo
    The world's first programming language built from the ground-up for AI developers, with cutting-edge compiler technology that delivers unparalleled performance and programmability for any hardware. More about Mojo.

Preview release

There's still a lot to come, but the MAX SDK is available now as a preview. Get started.

For details about what's still in the works, see the roadmap and known issues.

How to use MAX

Using MAX doesn't mean you have to migrate your entire AI pipeline and serving infrastructure—it meets you where you are now and allows you to incrementally migrate.

You can use the same models, libraries, and serving infrastructure that you use today, and capture immediate value from MAX with minimal migration. Then, when you're ready, you can migrate other parts of your AI pipeline to MAX for even more performance, programmability, and portability.

Add performance & portability

You can start by using our Python or C API to replace your current PyTorch, TensorFlow, or ONNX inference calls with MAX Engine inference calls. This simple change executes your models up to 5x faster (thus reducing your compute costs)—compared to stock PyTorch, TensorFlow, or ONNX Runtime—and it adds portability by making it compatible on a wide range of CPU architectures (Intel, AMD, ARM). (GPU support is also coming soon.)

For example, if you execute your models from Python, you can upgrade to MAX Engine with just 3 lines of code.

Additionally, you can upgrade your production inference performance with MAX Serving as a drop-in replacement for your NVIDIA Triton Inference Server.

Extend & optimize your models

Once you're executing your models with MAX Engine, you can optimize your performance further with our platform's unrivaled programmability.

MAX Engine is built upon Mojo, which makes MAX Engine fully extensible via Mojo. That means you can do more than just use MAX Engine to run a model—you can extend its capabilities.

For starters, you can use Mojo to write custom ops that are native to the compiler, which means the compiler can analyze, optimize, and fuse your ops in the graph. Or, you can use the MAX Graph API to build your whole model in Mojo (for inference), allowing you to optimize the low-level graph representation for the MAX Engine compiler.

Beyond inference performance in MAX Engine, you can also optimize the rest of you AI pipeline by migrating your data pre/post-processing code and application code to Mojo, using the MAX Engine Mojo API. We’ll also add more Mojo libraries in the future to help build these pipeline components.

How MAX works

MAX Engine is truly the "engine" that drives the MAX platform—it executes your existing AI models with incredible speed on a wide range of hardware. Within that engine, Mojo is the core technology that makes it performant, programmable, and portable.

When we began the effort to unify the world's AI infrastructure, we realized that programming across the entire AI stack—from the graph kernels up to the application layer—was too complicated. We wanted a programming model that could target heterogeneous hardware and also deliver state-of-the-art performance in the application. That's why we created Mojo, and it became the core technology for the rest of the MAX platform.

Again, you don't need to use Mojo. You can bring your existing models and execute them with MAX Engine using our API libraries in Python and C. However, using Mojo with MAX Engine gives you superpowers. Only with Mojo can you extend and optimize your models for execution in MAX Engine.

All of this is available for you to try today in the MAX SDK preview. Many more features are on the way, including tools to deploy your models on MAX in production.

And we've only just begun!

A production AI pipeline requires much more than models and a runtime. It also needs data loading, input transformations, server-client communications, data and system monitoring, and more. We will add more tools and libraries to MAX that accelerate and simplify development for these other parts of your AI pipeline over time. For more details about what we’re working on now, check out the MAX roadmap.

Get started
Share your ideas

Let us know what you think! What additional libraries do you need to streamline your AI development and deployment? Talk to us on Discord and GitHub.