Skip to main content

Get started with MAX

Welcome to the MAX quickstart guide!

This page provides a brief tour of what MAX has to offer you and your AI workloads. There's a lot to explore at the end of this walkthrough, so let's get started.

Preview release

We're excited to share this preview version of MAX! For details about what's included, see the MAX changelog, and for details about what's yet to come, see the roadmap and known issues.

1. Install MAX

See the MAX install guide.

2. Run your first model

Let's start with something simple, just to make sure MAX is installed and working.

First, clone the code examples:

git clone
Nightly branch

If you installed the nightly build, make sure to checkout the nightly branch:

(cd max && git checkout nightly)

Now let's run a PyTorch model using our Python API:

  1. Starting from where you cloned the repo, install the Python requirements:

    cd max/examples/inference/bert-python-torchscript
    python3 -m pip install -r requirements.txt
  2. Download and run the model:


    This script downloads the BERT model and runs it with some input text.

You should see results like this:

input text: Paris is the [MASK] of France.
filled mask: Paris is the capital of France.

Cool, it works! (If it didn't work, let us know.)

Compile time

The first time you run the example, it takes some time to compile the model. This might be unfamiliar if you're used to "eager execution" in ML frameworks, but MAX Engine uses next-generation compiler technology to optimize the graph and extract more performance, without any accuracy loss. This happens only when you load the model, and it pays dividends with significant speed-ups at run time.

The bash script we used takes care of setup, and then runs the model with a Python script, which you can see on GitHub. Our Python API allows you to use MAX Engine as a drop-in replacement for your existing runtime, with just 3 lines of code. You can also use MAX Engine with our C and Mojo APIs.

If you're interested in how our performance compares to stock frameworks on different CPU architectures, check out our performance dashboard.

Figure 1. MAX Engine latency speed-up when running Mistral-7B vs PyTorch (MAX Engine is 2.5x faster).

3. Try Llama3 on MAX

In the previous example, we ran a PyTorch model with Python, but MAX is about much more than that. You can also use MAX to build high-performance, state-of-the-art AI models in Mojo.

Mojo is a systems programming language built from the ground up to deliver maximum performance on any hardware and enable programmability across the entire AI software stack. You don't have to write a single line of Mojo to accelerate your models with MAX Engine. However, MAX Engine and Mojo share essential compiler technologies, which means Mojo has unique abilities that unlock new levels of performance for your models in MAX Engine.

Take a look for yourself. We've built the Llama 3 large language model entirely in Mojo, using the MAX Graph API. It's incredibly fast and you can try it right now:

  1. Navigate back to the path where you cloned our repo. Then navigate to the Llama 3 pipeline:

    cd max/examples/graph-api/pipelines/llama3
  2. Execute the model:

    mojo ../../run_pipeline.🔥 llama3 --prompt "I believe the meaning of life is"

After we download the weights and compile the model, you'll see it print the response in real-time as tokens are emitted by the model. This is all running locally on your CPU. When it's done, you'll also see some performance stats.

Check out the Mojo Llama 3 code on GitHub.

Next steps

MAX is available now as a developer preview, and there's much more to come. To understand how MAX can accelerate your AI workloads and simplify your workflows, try some of the following:

And this is just the beginning!

In the coming months, we'll add support for GPU hardware, more extensibility APIs, and more solutions for production deployment with MAX.

Join the discussion

Get in touch with other MAX developers, ask questions, and share feedback on Discord and GitHub.