MAX Engine accelerates the inference speed for your AI models as they are, without any changes. Our next-generation graph compiler and runtime can run your models on a wide range of hardware for immediate performance gains, using a simple Python API.
In this tutorial, we'll walk through the process step by step, starting from an empty project:
- Install MAX and create a virtual environment.
- Convert a PyTorch model from HuggingFace into ONNX format (the ResNet-50 image classification model).
- Run inference with MAX Engine.
Most of the code in this project is needed just to download the model, prepare the inputs, and process the outputs. The code to run inference with the MAX Engine is just three lines of code (not counting the import):
from max import engine
# After you prepare the inputs, load and run the model:
session = engine.InferenceSession()
model = session.load(model_path)
outputs = model.execute(**inputs)
# Then process the outputs.
Let's get started!
If you experience any issues in this tutorial, please let us know on GitHub.
Create a virtual environmentβ
Using a virtual environment ensures that you have the Python version and packages that are compatible with this project. We'll use the Magic CLI to create the environment and install the required packages.
If you don't have Magic, click here.
You can install Magic on macOS and Ubuntu Linux with this command:
curl -ssL https://magic.modular.com | bash
Then run the source
command that's printed in your terminal.
-
Create a new Python project and install the dependencies:
magic init max-onnx-resnet && cd max-onnx-resnet
-
Add MAX and NumPy from conda:
magic add max "numpy<2.0"
-
Add the other Python packages from PyPI:
magic add --pypi "datasets==2.18" "onnx==1.16.0" \
"pillow==10.3.0" "torch==2.2.2" "transformers==4.40.1" -
Now you can start a shell in the environment and see your MAX version:
magic shell
python3 -c 'from max import engine; print(engine.__version__)'
Download the ONNX modelβ
Now let's download the
ResNet-50 model from HuggingFace.
We'll use
ResNetForImageClassificaion
,
which gives us a HugginFace object that's a subclass of a PyTorch
Module
.
However, MAX Engine currently can't compile a
Module
object. So we need to export the model into either a TorchScript or ONNX
file (learn more about supported file formats).
For this project, we'll save the PyTorch model as an ONNX file, which we can do
with the
torch.onnx.export()
function. Because a PyTorch model (a
Module
)
is just Python code (not a static graph that we can save as a file), the
export()
function must actually run the model with a forward pass. This pass traces the
model, builds a static graph representation, and then saves the graph as a file.
(Then, we can pass that file to the MAX compiler.)
To trace the model with a forward pass, the
export()
function needs some input data. The input data doesn't need to be real, so
we'll generate random data that matches the model's input shape.
Here's how to export the model from HuggingFace to an ONNX file:
-
Create a file named
download-model.py
and paste this code:download-model.pyimport torch
from transformers import ResNetForImageClassification
from torch.onnx import export
# The HuggingFace model name and exported file name
HF_MODEL_NAME = "microsoft/resnet-50"
MODEL_PATH = "resnet50.onnx"
def main():
# Load the ResNet model from HuggingFace in evaluation mode
model = ResNetForImageClassification.from_pretrained(HF_MODEL_NAME)
model.eval()
# Create random input for tracing, then export the model to ONNX
dummy_input = torch.randn(1, 3, 224, 224)
export(model, dummy_input, MODEL_PATH, opset_version=11,
input_names=['pixel_values'], output_names=['output'],
dynamic_axes={'pixel_values': {0: 'batch_size'}, 'output': {0: 'batch_size'}})
print(f"Model saved as {MODEL_PATH}")
if __name__ == "__main__":
main()Because
export()
simply traces the execution of the model, we also need to specify the input and output tensor names we want applied to the static graph. We usepixel_values
as the input name because that's the input name used by the original ResNet model, and it's the name that the HuggingFaceAutoImageProcessor
uses when it prepares the input (as you'll see below). -
Now run the file:
python3 download-model.py
You should now have a file named resnet50.onnx
.
Run the model with MAX Engineβ
Now that you have the model, we can execute it using MAX Engine.
Running inference with the MAX Engine Python API is just 3 lines of code. So, most of this code prepares the model input and processes the output, and we'll use the HuggingFace Transformer APIs to help with that stuff.
Start by creating the executable file called run.py
with the required imports
and a main()
function:
from transformers import AutoImageProcessor, AutoModelForImageClassification
from datasets import load_dataset
import numpy as np
from max import engine
# The HuggingFace model name and exported file name
HF_MODEL_NAME = "microsoft/resnet-50"
MODEL_PATH = "resnet50.onnx"
def main():
# This is where we'll add our code
if __name__ == "__main__":
main()
Prepare the inputβ
First, let's load a test image from HuggingFace Datasets, and print the input keys and input shape to be sure they are what we expect.
Add this code inside the main()
function:
dataset = load_dataset("huggingface/cats-image", trust_remote_code=True)
image = dataset["test"]["image"][0]
# optionally, save the image to see it yourself:
# image.save("cat.png")
image_processor = AutoImageProcessor.from_pretrained(HF_MODEL_NAME)
inputs = image_processor(image, return_tensors="np")
print("Keys:", inputs.keys())
print("Shape:", inputs['pixel_values'].shape)
Then run the file:
python3 run.py
You should see output with this information:
Keys: dict_keys(['pixel_values'])
Shape: (1, 3, 224, 224)
Looks good! The name and shape of the test input matches what the model expects.
Run inferenceβ
Now we're ready to run inference with MAX Engine.
First, we instantiate an
InferenceSession
.
Then we pass the ONNX file to
load()
, and
pass the input to
execute()
.
Just add this code at the end of the main()
function:
session = engine.InferenceSession()
model = session.load(MODEL_PATH)
outputs = model.execute(**inputs)
print("Output shape:", outputs['output'].shape)
Then run it again:
python3 run.py
The first time you load a model, it might take a few minutes to compile, but
this up-front cost will pay dividends in latency savings provided by our
next-generation graph compiler. That is, load()
is a slow one-time operation,
but execute()
is very fast. (Calling load()
again will also use a cached
version of the compiled model.)
The printed output shape is (1, 1000)
, which is the batch size and the
number of result classifications (the ResNet model is trained with the ImageNet
dataset of 1,000 classes).
Process the outputβ
To wrap this up, let's process the model output to see the model's prediction.
We'll use the
AutoModelForImageClassification
API to get the predicted class name:
predicted_label = np.argmax(outputs["output"], axis=-1)[0]
hf_model = AutoModelForImageClassification.from_pretrained(HF_MODEL_NAME)
predicted_class = hf_model.config.id2label[predicted_label]
print(f"Prediction: {predicted_class}")
Then run it one more time:
python3 run.py
You should now see Prediction: tiger cat
.
That's it!
Click here to see the finished run.py
:
from transformers import AutoImageProcessor, AutoModelForImageClassification
from datasets import load_dataset
import numpy as np
from max import engine
# The HuggingFace model name and exported file name
HF_MODEL_NAME = "microsoft/resnet-50"
MODEL_PATH = "resnet50.onnx"
def main():
dataset = load_dataset("huggingface/cats-image", trust_remote_code=True)
image = dataset["test"]["image"][0]
# optionally, save the image to see it yourself:
# image.save("cat.png")
image_processor = AutoImageProcessor.from_pretrained(HF_MODEL_NAME)
inputs = image_processor(image, return_tensors="np")
print("Keys:", inputs.keys())
print("Shape:", inputs['pixel_values'].shape)
session = engine.InferenceSession()
model = session.load(MODEL_PATH)
outputs = model.execute(**inputs)
print("Output shape:", outputs['output'].shape)
predicted_label = np.argmax(outputs["output"], axis=-1)[0]
hf_model = AutoModelForImageClassification.from_pretrained(HF_MODEL_NAME)
predicted_class = hf_model.config.id2label[predicted_label]
print(f"Prediction: {predicted_class}")
if __name__ == "__main__":
main()
Next stepsβ
This tutorial covered just the basics for running inference with the MAX Engine Python API. You can use what you learned here to run other ONNX models (either exported from PyTorch, TensorFlow, or other formats) and compare MAX Engine's performance to the native framework runtimes.
But this is just the beginning of what MAX has to offer!
Here are some other tutorials to try next:
Deploy a model with Kubernetes and Helm
Learn how to deploy your model with MAX Engine using AWS and Kubernetes.
Deploy a model with AWS CloudFormation
Learn how to deploy a model using MAX Engine and AWS CloudFormation.
Did this tutorial work for you?
Thank you! We'll create more content like this.
Thank you for helping us improve!
If you'd like to share more information, please report an issue on GitHub
π What went wrong?