Skip to main content
Log in

Run an ONNX model with Python

Scott Main

Updated:Β 

|
14 min read

MAX Engine accelerates the inference speed for your AI models as they are, without any changes. Our next-generation graph compiler and runtime can run your models on a wide range of hardware for immediate performance gains, using a simple Python API.

In this tutorial, we'll walk through the process step by step, starting from an empty project:

  1. Install MAX and create a virtual environment.
  2. Convert a PyTorch model from HuggingFace into ONNX format (the ResNet-50 image classification model).
  3. Run inference with MAX Engine.

Most of the code in this project is needed just to download the model, prepare the inputs, and process the outputs. The code to run inference with the MAX Engine is just three lines of code (not counting the import):

from max import engine

# After you prepare the inputs, load and run the model:
session = engine.InferenceSession()
model = session.load(model_path)
outputs = model.execute(**inputs)
# Then process the outputs.

Let's get started!

Trouble?

If you experience any issues in this tutorial, please let us know on GitHub.

Create a virtual environment​

Using a virtual environment ensures that you have the Python version and packages that are compatible with this project. We'll use the Magic CLI to create the environment and install the required packages.

If you don't have Magic, click here.

You can install Magic on macOS and Ubuntu Linux with this command:

curl -ssL https://magic.modular.com | bash

Then run the source command that's printed in your terminal.

  1. Create a new Python project and install the dependencies:

    magic init max-onnx-resnet && cd max-onnx-resnet
  2. Add MAX and NumPy from conda:

    magic add max "numpy<2.0"
  3. Add the other Python packages from PyPI:

    magic add --pypi "datasets==2.18" "onnx==1.16.0" \
    "pillow==10.3.0" "torch==2.2.2" "transformers==4.40.1"
  4. Now you can start a shell in the environment and see your MAX version:

    magic shell
    python3 -c 'from max import engine; print(engine.__version__)'

Download the ONNX model​

Now let's download the ResNet-50 model from HuggingFace. We'll use ResNetForImageClassificaion, which gives us a HugginFace object that's a subclass of a PyTorch Module. However, MAX Engine currently can't compile a Module object. So we need to export the model into either a TorchScript or ONNX file (learn more about supported file formats).

For this project, we'll save the PyTorch model as an ONNX file, which we can do with the torch.onnx.export() function. Because a PyTorch model (a Module) is just Python code (not a static graph that we can save as a file), the export() function must actually run the model with a forward pass. This pass traces the model, builds a static graph representation, and then saves the graph as a file. (Then, we can pass that file to the MAX compiler.)

To trace the model with a forward pass, the export() function needs some input data. The input data doesn't need to be real, so we'll generate random data that matches the model's input shape.

Here's how to export the model from HuggingFace to an ONNX file:

  1. Create a file named download-model.py and paste this code:

    download-model.py
    import torch
    from transformers import ResNetForImageClassification
    from torch.onnx import export

    # The HuggingFace model name and exported file name
    HF_MODEL_NAME = "microsoft/resnet-50"
    MODEL_PATH = "resnet50.onnx"

    def main():
    # Load the ResNet model from HuggingFace in evaluation mode
    model = ResNetForImageClassification.from_pretrained(HF_MODEL_NAME)
    model.eval()

    # Create random input for tracing, then export the model to ONNX
    dummy_input = torch.randn(1, 3, 224, 224)
    export(model, dummy_input, MODEL_PATH, opset_version=11,
    input_names=['pixel_values'], output_names=['output'],
    dynamic_axes={'pixel_values': {0: 'batch_size'}, 'output': {0: 'batch_size'}})

    print(f"Model saved as {MODEL_PATH}")

    if __name__ == "__main__":
    main()

    Because export() simply traces the execution of the model, we also need to specify the input and output tensor names we want applied to the static graph. We use pixel_values as the input name because that's the input name used by the original ResNet model, and it's the name that the HuggingFace AutoImageProcessor uses when it prepares the input (as you'll see below).

  2. Now run the file:

    python3 download-model.py

You should now have a file named resnet50.onnx.

Run the model with MAX Engine​

Now that you have the model, we can execute it using MAX Engine.

Running inference with the MAX Engine Python API is just 3 lines of code. So, most of this code prepares the model input and processes the output, and we'll use the HuggingFace Transformer APIs to help with that stuff.

Start by creating the executable file called run.py with the required imports and a main() function:

run.py
from transformers import AutoImageProcessor, AutoModelForImageClassification
from datasets import load_dataset
import numpy as np
from max import engine

# The HuggingFace model name and exported file name
HF_MODEL_NAME = "microsoft/resnet-50"
MODEL_PATH = "resnet50.onnx"

def main():
# This is where we'll add our code

if __name__ == "__main__":
main()

Prepare the input​

First, let's load a test image from HuggingFace Datasets, and print the input keys and input shape to be sure they are what we expect.

Add this code inside the main() function:

    dataset = load_dataset("huggingface/cats-image", trust_remote_code=True)
image = dataset["test"]["image"][0]
# optionally, save the image to see it yourself:
# image.save("cat.png")

image_processor = AutoImageProcessor.from_pretrained(HF_MODEL_NAME)
inputs = image_processor(image, return_tensors="np")

print("Keys:", inputs.keys())
print("Shape:", inputs['pixel_values'].shape)

Then run the file:

python3 run.py

You should see output with this information:

Keys: dict_keys(['pixel_values'])
Shape: (1, 3, 224, 224)

Looks good! The name and shape of the test input matches what the model expects.

Run inference​

Now we're ready to run inference with MAX Engine.

First, we instantiate an InferenceSession. Then we pass the ONNX file to load(), and pass the input to execute().

Just add this code at the end of the main() function:

    session = engine.InferenceSession()
model = session.load(MODEL_PATH)
outputs = model.execute(**inputs)

print("Output shape:", outputs['output'].shape)

Then run it again:

python3 run.py
compile time

The first time you load a model, it might take a few minutes to compile, but this up-front cost will pay dividends in latency savings provided by our next-generation graph compiler. That is, load() is a slow one-time operation, but execute() is very fast. (Calling load() again will also use a cached version of the compiled model.)

The printed output shape is (1, 1000), which is the batch size and the number of result classifications (the ResNet model is trained with the ImageNet dataset of 1,000 classes).

Process the output​

To wrap this up, let's process the model output to see the model's prediction. We'll use the AutoModelForImageClassification API to get the predicted class name:

    predicted_label = np.argmax(outputs["output"], axis=-1)[0]
hf_model = AutoModelForImageClassification.from_pretrained(HF_MODEL_NAME)
predicted_class = hf_model.config.id2label[predicted_label]
print(f"Prediction: {predicted_class}")

Then run it one more time:

python3 run.py

You should now see Prediction: tiger cat.

That's it!

Click here to see the finished run.py:
run.py
from transformers import AutoImageProcessor, AutoModelForImageClassification
from datasets import load_dataset
import numpy as np
from max import engine

# The HuggingFace model name and exported file name
HF_MODEL_NAME = "microsoft/resnet-50"
MODEL_PATH = "resnet50.onnx"

def main():
dataset = load_dataset("huggingface/cats-image", trust_remote_code=True)
image = dataset["test"]["image"][0]
# optionally, save the image to see it yourself:
# image.save("cat.png")

image_processor = AutoImageProcessor.from_pretrained(HF_MODEL_NAME)
inputs = image_processor(image, return_tensors="np")

print("Keys:", inputs.keys())
print("Shape:", inputs['pixel_values'].shape)

session = engine.InferenceSession()
model = session.load(MODEL_PATH)
outputs = model.execute(**inputs)

print("Output shape:", outputs['output'].shape)

predicted_label = np.argmax(outputs["output"], axis=-1)[0]
hf_model = AutoModelForImageClassification.from_pretrained(HF_MODEL_NAME)
predicted_class = hf_model.config.id2label[predicted_label]
print(f"Prediction: {predicted_class}")

if __name__ == "__main__":
main()

Next steps​

This tutorial covered just the basics for running inference with the MAX Engine Python API. You can use what you learned here to run other ONNX models (either exported from PyTorch, TensorFlow, or other formats) and compare MAX Engine's performance to the native framework runtimes.

But this is just the beginning of what MAX has to offer!

Here are some other tutorials to try next:

Did this tutorial work for you?