MAX Engine accelerates the inference speed for your AI models as they are, without any changes. Our next-generation graph compiler and runtime can run your models on a wide range of hardware for immediate performance gains, using a simple Python API.
In this tutorial, we'll walk through the process step by step, starting from an empty project:
- Install MAX and create a virtual environment.
- Convert a PyTorch model from HuggingFace into ONNX format (the ResNet-50 image classification model).
- Run inference with MAX Engine.
Most of the code in this project is needed just to download the model, prepare the inputs, and process the outputs. The code to run inference with the MAX Engine is just three lines of code (not counting the import):
from max import engine
# After you prepare the inputs, load and run the model:
session = engine.InferenceSession()
model = session.load(model_path)
outputs = model.execute(**inputs)
# Then process the outputs.
from max import engine
# After you prepare the inputs, load and run the model:
session = engine.InferenceSession()
model = session.load(model_path)
outputs = model.execute(**inputs)
# Then process the outputs.
Let's get started!
Create a virtual environment
Using a virtual environment ensures that you have the Python version and packages that are compatible with this project. We'll use the Magic CLI to create the environment and install the required packages.
magic
0.3.0-
Create a new Python project and install the dependencies:
magic init max-onnx-resnet && cd max-onnx-resnet
magic init max-onnx-resnet && cd max-onnx-resnet
-
Specify the Python version for the virtual environment:
magic add "python>=3.9,<3.13"
magic add "python>=3.9,<3.13"
-
Add MAX and other packages from conda:
magic add max "pytorch>=2.4.0,<3" "numpy<2.0" "onnx==1.16.0" \
"transformers==4.40.1" "datasets==2.18"magic add max "pytorch>=2.4.0,<3" "numpy<2.0" "onnx==1.16.0" \
"transformers==4.40.1" "datasets==2.18" -
And because the
pillow
version we want is not available in conda-forge, we'll install that from PyPI:magic add --pypi "pillow==10.3.0"
magic add --pypi "pillow==10.3.0"
-
Now you can start a shell in the environment and see your MAX version:
magic shell
magic shell
python3 -c 'from max import engine; print(engine.__version__)'
python3 -c 'from max import engine; print(engine.__version__)'
Download the ONNX model
Now let's download the
ResNet-50 model from HuggingFace.
We'll use
ResNetForImageClassificaion
,
which gives us a HugginFace object that's a subclass of a PyTorch
Module
.
However, MAX Engine currently can't compile a
Module
object. So we need to export the model into either a TorchScript or ONNX
file (learn more about supported file formats).
For this project, we'll save the PyTorch model as an ONNX file, which we can do
with the
torch.onnx.export()
function. Because a PyTorch model (a
Module
)
is just Python code (not a static graph that we can save as a file), the
export()
function must actually run the model with a forward pass. This pass traces the
model, builds a static graph representation, and then saves the graph as a file.
(Then, we can pass that file to the MAX compiler.)
To trace the model with a forward pass, the
export()
function needs some input data. The input data doesn't need to be real, so
we'll generate random data that matches the model's input shape.
Here's how to export the model from HuggingFace to an ONNX file:
-
Create a file named
download-model.py
and paste this code:download-model.pyimport torch
from transformers import ResNetForImageClassification
from torch.onnx import export
# The HuggingFace model name and exported file name
HF_MODEL_NAME = "microsoft/resnet-50"
MODEL_PATH = "resnet50.onnx"
def main():
# Load the ResNet model from HuggingFace in evaluation mode
model = ResNetForImageClassification.from_pretrained(HF_MODEL_NAME)
model.eval()
# Create random input for tracing, then export the model to ONNX
dummy_input = torch.randn(1, 3, 224, 224)
export(model, dummy_input, MODEL_PATH, opset_version=11,
input_names=['pixel_values'], output_names=['output'],
dynamic_axes={'pixel_values': {0: 'batch_size'}, 'output': {0: 'batch_size'}})
print(f"Model saved as {MODEL_PATH}")
if __name__ == "__main__":
main()import torch
from transformers import ResNetForImageClassification
from torch.onnx import export
# The HuggingFace model name and exported file name
HF_MODEL_NAME = "microsoft/resnet-50"
MODEL_PATH = "resnet50.onnx"
def main():
# Load the ResNet model from HuggingFace in evaluation mode
model = ResNetForImageClassification.from_pretrained(HF_MODEL_NAME)
model.eval()
# Create random input for tracing, then export the model to ONNX
dummy_input = torch.randn(1, 3, 224, 224)
export(model, dummy_input, MODEL_PATH, opset_version=11,
input_names=['pixel_values'], output_names=['output'],
dynamic_axes={'pixel_values': {0: 'batch_size'}, 'output': {0: 'batch_size'}})
print(f"Model saved as {MODEL_PATH}")
if __name__ == "__main__":
main()Because
export()
simply traces the execution of the model, we also need to specify the input and output tensor names we want applied to the static graph. We usepixel_values
as the input name because that's the input name used by the original ResNet model, and it's the name that the HuggingFaceAutoImageProcessor
uses when it prepares the input (as you'll see below). -
Now run the file:
python3 download-model.py
python3 download-model.py
You should now have a file named resnet50.onnx
.
Run the model with MAX Engine
Now that you have the model, we can execute it using MAX Engine.
Running inference with the MAX Engine Python API is just 3 lines of code. So, most of this code prepares the model input and processes the output, and we'll use the HuggingFace Transformer APIs to help with that stuff.
Start by creating the executable file called run.py
with the required imports
and a main()
function:
from transformers import AutoImageProcessor, AutoModelForImageClassification
from datasets import load_dataset
import numpy as np
from max import engine
# The HuggingFace model name and exported file name
HF_MODEL_NAME = "microsoft/resnet-50"
MODEL_PATH = "resnet50.onnx"
def main():
# This is where we'll add our code
if __name__ == "__main__":
main()
from transformers import AutoImageProcessor, AutoModelForImageClassification
from datasets import load_dataset
import numpy as np
from max import engine
# The HuggingFace model name and exported file name
HF_MODEL_NAME = "microsoft/resnet-50"
MODEL_PATH = "resnet50.onnx"
def main():
# This is where we'll add our code
if __name__ == "__main__":
main()
Prepare the input
First, let's load a test image from HuggingFace Datasets, and print the input keys and input shape to be sure they are what we expect.
Add this code inside the main()
function:
dataset = load_dataset("huggingface/cats-image", trust_remote_code=True)
image = dataset["test"]["image"][0]
# optionally, save the image to see it yourself:
# image.save("cat.png")
image_processor = AutoImageProcessor.from_pretrained(HF_MODEL_NAME)
inputs = image_processor(image, return_tensors="np")
print("Keys:", inputs.keys())
print("Shape:", inputs['pixel_values'].shape)
dataset = load_dataset("huggingface/cats-image", trust_remote_code=True)
image = dataset["test"]["image"][0]
# optionally, save the image to see it yourself:
# image.save("cat.png")
image_processor = AutoImageProcessor.from_pretrained(HF_MODEL_NAME)
inputs = image_processor(image, return_tensors="np")
print("Keys:", inputs.keys())
print("Shape:", inputs['pixel_values'].shape)
Then run the file:
python3 run.py
python3 run.py
You should see output with this information:
Keys: dict_keys(['pixel_values'])
Shape: (1, 3, 224, 224)
Keys: dict_keys(['pixel_values'])
Shape: (1, 3, 224, 224)
Looks good! The name and shape of the test input matches what the model expects.
Run inference
Now we're ready to run inference with MAX Engine.
First, we instantiate an
InferenceSession
.
Then we pass the ONNX file to
load()
, and
pass the input to
execute()
.
Just add this code at the end of the main()
function:
session = engine.InferenceSession()
model = session.load(MODEL_PATH)
outputs = model.execute(**inputs)
print("Output shape:", outputs['output'].shape)
session = engine.InferenceSession()
model = session.load(MODEL_PATH)
outputs = model.execute(**inputs)
print("Output shape:", outputs['output'].shape)
Then run it again:
python3 run.py
python3 run.py
The printed output shape is (1, 1000)
, which is the batch size and the
number of result classifications (the ResNet model is trained with the ImageNet
dataset of 1,000 classes).
Process the output
To wrap this up, let's process the model output to see the model's prediction.
We'll use the
AutoModelForImageClassification
API to get the predicted class name:
predicted_label = np.argmax(outputs["output"], axis=-1)[0]
hf_model = AutoModelForImageClassification.from_pretrained(HF_MODEL_NAME)
predicted_class = hf_model.config.id2label[predicted_label]
print(f"Prediction: {predicted_class}")
predicted_label = np.argmax(outputs["output"], axis=-1)[0]
hf_model = AutoModelForImageClassification.from_pretrained(HF_MODEL_NAME)
predicted_class = hf_model.config.id2label[predicted_label]
print(f"Prediction: {predicted_class}")
Then run it one more time:
python3 run.py
python3 run.py
You should now see Prediction: tiger cat
.
That's it!
Next steps
This tutorial covered just the basics for running inference with the MAX Engine Python API. You can use what you learned here to run other ONNX models (either exported from PyTorch, TensorFlow, or other formats) and compare MAX Engine's performance to the native framework runtimes.
But this is just the beginning of what MAX has to offer!
Here are some other tutorials to try next: