Run an ONNX model with Python

Technical Writer

18 min read

python

onnx

MAX Engine accelerates the inference speed for your AI models as they are, without any changes. Our next-generation graph compiler and runtime can run your models on a wide range of hardware for immediate performance gains, using a simple Python API.

In this tutorial, we'll walk through the process step by step, starting from an empty project:

Install MAX and create a virtual environment.
Convert a PyTorch model from Hugging Face into ONNX format (the ResNet-50 image classification model).
Run inference with MAX Engine.

Most of the code in this project is needed just to download the model, prepare the inputs, and process the outputs. The code to run inference with the MAX Engine is just three lines of code (not counting the import):

from max import engine

# After you prepare the inputs, load and run the model:
session = engine.InferenceSession()
model = session.load(model_path)
outputs = model.execute_legacy(**inputs)
# Then process the outputs.
from max import engine

# After you prepare the inputs, load and run the model:
session = engine.InferenceSession()
model = session.load(model_path)
outputs = model.execute_legacy(**inputs)
# Then process the outputs.

Starting in 24.6.0, the model.execute() command no longer accepts keyword arguments. In a future release we will restore this functionality with support for GPUs. For compatibility with existing code that uses keyword arguments, you can use the execute_legacy() function.

Let's get started!

Create a virtual environment

Using a virtual environment ensures that you have the Python version and packages that are compatible with this project. We'll use the Magic CLI to create the environment and install the required packages.

If you don't have the magic CLI yet, you can install it on macOS and Ubuntu Linux with this command:
curl -ssL https://magic.modular.com/ | bash
curl -ssL https://magic.modular.com/ | bash
Then run the source command that's printed in your terminal.

Create a new Python project and install the dependencies:

magic init max-onnx-resnet --format pyproject && cd max-onnx-resnet
magic init max-onnx-resnet --format pyproject && cd max-onnx-resnet

Add pytorch to your package channels:
```
magic project channel add pytorch --prepend
```
```
magic project channel add pytorch --prepend
```
The --prepend option is necessary to put pytorch before conda-forge, as per channel priority. This ensures that you install the official PyTorch package, instead of the version from conda-forge.

Add MAX and other packages from conda:

magic add "max~=25.1" "pytorch==2.4.0" "numpy<2.0" "onnx==1.16.0" \
  "transformers==4.40.1" "datasets==2.18" "pillow"
magic add "max~=25.1" "pytorch==2.4.0" "numpy<2.0" "onnx==1.16.0" \
  "transformers==4.40.1" "datasets==2.18" "pillow"

Now you can start a shell in the environment and see your MAX version:

magic shell

magic shell

python3 -c 'from max import engine; print(engine.__version__)'
python3 -c 'from max import engine; print(engine.__version__)'

Download the ONNX model

Now let's download the ResNet-50 model from Hugging Face. We'll use ResNetForImageClassificaion, which gives us a HugginFace object that's a subclass of a PyTorch Module. However, MAX Engine currently can't compile a Module object. So we need to export the model into either a TorchScript or ONNX file (learn more about supported file formats).

For this project, we'll save the PyTorch model as an ONNX file, which we can do with the torch.onnx.export() function. Because a PyTorch model (a Module) is just Python code (not a static graph that we can save as a file), the export() function must actually run the model with a forward pass. This pass traces the model, builds a static graph representation, and then saves the graph as a file. (Then, we can pass that file to the MAX compiler.)

To trace the model with a forward pass, the export() function needs some input data. The input data doesn't need to be real, so we'll generate random data that matches the model's input shape.

Here's how to export the model from Hugging Face to an ONNX file:

Create a file named download-model.py inside max-onnx-resnet/ and paste this code:

download-model.py
import torch
from transformers import ResNetForImageClassification
from torch.onnx import export

# The Hugging Face model name and exported file name
HF_MODEL_NAME = "microsoft/resnet-50"
MODEL_PATH = "resnet50.onnx"

def main():
    # Load the ResNet model from Hugging Face in evaluation mode
    model = ResNetForImageClassification.from_pretrained(HF_MODEL_NAME)
    model.eval()

    # Create random input for tracing, then export the model to ONNX
    dummy_input = torch.randn(1, 3, 224, 224)
    export(model, dummy_input, MODEL_PATH, opset_version=11,
          input_names=['pixel_values'], output_names=['output'],
          dynamic_axes={'pixel_values': {0: 'batch_size'}, 'output': {0: 'batch_size'}})

    print(f"Model saved as {MODEL_PATH}")

if __name__ == "__main__":
    main()
import torch
from transformers import ResNetForImageClassification
from torch.onnx import export

# The Hugging Face model name and exported file name
HF_MODEL_NAME = "microsoft/resnet-50"
MODEL_PATH = "resnet50.onnx"

def main():
    # Load the ResNet model from Hugging Face in evaluation mode
    model = ResNetForImageClassification.from_pretrained(HF_MODEL_NAME)
    model.eval()

    # Create random input for tracing, then export the model to ONNX
    dummy_input = torch.randn(1, 3, 224, 224)
    export(model, dummy_input, MODEL_PATH, opset_version=11,
          input_names=['pixel_values'], output_names=['output'],
          dynamic_axes={'pixel_values': {0: 'batch_size'}, 'output': {0: 'batch_size'}})

    print(f"Model saved as {MODEL_PATH}")

if __name__ == "__main__":
    main()

Because export() simply traces the execution of the model, we also need to specify the input and output tensor names we want applied to the static graph. We use pixel_values as the input name because that's the input name used by the original ResNet model, and it's the name that the Hugging Face AutoImageProcessor uses when it prepares the input (as you'll see below).

Now run the file:

python3 download-model.py

python3 download-model.py

You should now have a file named resnet50.onnx.

Run the model with MAX Engine

Now that you have the model, we can execute it using MAX Engine.

Running inference with the MAX Engine Python API is just 3 lines of code. So, most of this code prepares the model input and processes the output, and we'll use the Hugging Face Transformer APIs to help with that stuff.

Start by creating the executable file called run.py with the required imports and a main() function:

run.py
from transformers import AutoImageProcessor, AutoModelForImageClassification
from datasets import load_dataset
import numpy as np
from max import engine

# The Hugging Face model name and exported file name
HF_MODEL_NAME = "microsoft/resnet-50"
MODEL_PATH = "resnet50.onnx"

def main():
    # This is where we'll add our code

if __name__ == "__main__":
    main()
from transformers import AutoImageProcessor, AutoModelForImageClassification
from datasets import load_dataset
import numpy as np
from max import engine

# The Hugging Face model name and exported file name
HF_MODEL_NAME = "microsoft/resnet-50"
MODEL_PATH = "resnet50.onnx"

def main():
    # This is where we'll add our code

if __name__ == "__main__":
    main()

Prepare the input

First, let's load a test image from Hugging Face Datasets, and print the input keys and input shape to be sure they are what we expect.

Add this code inside the main() function:

    dataset = load_dataset("huggingface/cats-image", trust_remote_code=True)
    image = dataset["test"]["image"][0]
    # optionally, save the image to see it yourself:
    # image.save("cat.png")

    image_processor = AutoImageProcessor.from_pretrained(HF_MODEL_NAME)
    inputs = image_processor(image, return_tensors="np")

    print("Keys:", inputs.keys())
    print("Shape:", inputs['pixel_values'].shape)
    dataset = load_dataset("huggingface/cats-image", trust_remote_code=True)
    image = dataset["test"]["image"][0]
    # optionally, save the image to see it yourself:
    # image.save("cat.png")

    image_processor = AutoImageProcessor.from_pretrained(HF_MODEL_NAME)
    inputs = image_processor(image, return_tensors="np")

    print("Keys:", inputs.keys())
    print("Shape:", inputs['pixel_values'].shape)

Then run the file:

python3 run.py

python3 run.py

You should see output with this information:

Keys: dict_keys(['pixel_values'])
Shape: (1, 3, 224, 224)
Keys: dict_keys(['pixel_values'])
Shape: (1, 3, 224, 224)

Looks good! The name and shape of the test input matches what the model expects.

Run inference

Now we're ready to run inference with MAX Engine.

First, we instantiate an InferenceSession. Then we pass the ONNX file to load(), and pass the input to execute().

Just add this code at the end of the main() function:

    session = engine.InferenceSession()
    model = session.load(MODEL_PATH)
    outputs = model.execute_legacy(**inputs)

    print("Output shape:", outputs['output'].shape)
    session = engine.InferenceSession()
    model = session.load(MODEL_PATH)
    outputs = model.execute_legacy(**inputs)

    print("Output shape:", outputs['output'].shape)

Then run it again:

python3 run.py

python3 run.py

compile time

The first time you load a model, it might take a few minutes to compile, but this up-front cost will pay dividends in latency savings provided by our next-generation graph compiler. That is, load() is a slow one-time operation, but execute() is very fast. (Calling load() again will also use a cached version of the compiled model.)

The printed output shape is (1, 1000), which is the batch size and the number of result classifications (the ResNet model is trained with the ImageNet dataset of 1,000 classes).

Process the output

To wrap this up, let's process the model output to see the model's prediction. We'll use the AutoModelForImageClassification API to get the predicted class name:

    predicted_label = np.argmax(outputs["output"], axis=-1)[0]
    hf_model = AutoModelForImageClassification.from_pretrained(HF_MODEL_NAME)
    predicted_class = hf_model.config.id2label[predicted_label]
    print(f"Prediction: {predicted_class}")
    predicted_label = np.argmax(outputs["output"], axis=-1)[0]
    hf_model = AutoModelForImageClassification.from_pretrained(HF_MODEL_NAME)
    predicted_class = hf_model.config.id2label[predicted_label]
    print(f"Prediction: {predicted_class}")

Then run it one more time:

python3 run.py

python3 run.py

You should now see Prediction: tiger cat.

That's it!

Click here to see the finished run.py:

run.py
from transformers import AutoImageProcessor, AutoModelForImageClassification
from datasets import load_dataset
import numpy as np
from max import engine

# The Hugging Face model name and exported file name
HF_MODEL_NAME = "microsoft/resnet-50"
MODEL_PATH = "resnet50.onnx"

def main():
    dataset = load_dataset("huggingface/cats-image", trust_remote_code=True)
    image = dataset["test"]["image"][0]
    # optionally, save the image to see it yourself:
    # image.save("cat.png")

    image_processor = AutoImageProcessor.from_pretrained(HF_MODEL_NAME)
    inputs = image_processor(image, return_tensors="np")

    print("Keys:", inputs.keys())
    print("Shape:", inputs['pixel_values'].shape)

    session = engine.InferenceSession()
    model = session.load(MODEL_PATH)
    outputs = model.execute_legacy(**inputs)

    print("Output shape:", outputs['output'].shape)

    predicted_label = np.argmax(outputs["output"], axis=-1)[0]
    hf_model = AutoModelForImageClassification.from_pretrained(HF_MODEL_NAME)
    predicted_class = hf_model.config.id2label[predicted_label]
    print(f"Prediction: {predicted_class}")

if __name__ == "__main__":
    main()
from transformers import AutoImageProcessor, AutoModelForImageClassification
from datasets import load_dataset
import numpy as np
from max import engine

# The Hugging Face model name and exported file name
HF_MODEL_NAME = "microsoft/resnet-50"
MODEL_PATH = "resnet50.onnx"

def main():
    dataset = load_dataset("huggingface/cats-image", trust_remote_code=True)
    image = dataset["test"]["image"][0]
    # optionally, save the image to see it yourself:
    # image.save("cat.png")

    image_processor = AutoImageProcessor.from_pretrained(HF_MODEL_NAME)
    inputs = image_processor(image, return_tensors="np")

    print("Keys:", inputs.keys())
    print("Shape:", inputs['pixel_values'].shape)

    session = engine.InferenceSession()
    model = session.load(MODEL_PATH)
    outputs = model.execute_legacy(**inputs)

    print("Output shape:", outputs['output'].shape)

    predicted_label = np.argmax(outputs["output"], axis=-1)[0]
    hf_model = AutoModelForImageClassification.from_pretrained(HF_MODEL_NAME)
    predicted_class = hf_model.config.id2label[predicted_label]
    print(f"Prediction: {predicted_class}")

if __name__ == "__main__":
    main()

Next steps

This tutorial covered just the basics for running inference with the MAX Engine Python API. You can use what you learned here to run other ONNX models (either exported from PyTorch, TensorFlow, or other formats) and compare MAX Engine's performance to the native framework runtimes.

But this is just the beginning of what MAX has to offer!

Here are some other tutorials to try next:

Deploy a model with Kubernetes and Helm

Learn how to deploy your model with MAX Engine using AWS and Kubernetes.

Deploy a model with AWS CloudFormation

Learn how to deploy a model using MAX Engine and AWS CloudFormation.

Create a virtual environment​

Download the ONNX model​

Run the model with MAX Engine​

Prepare the input​

Run inference​

Process the output​

Next steps​