Benchmark a model
To better understand your model’s performance in MAX Engine, you can benchmark
it with MLPerf scenarios using max benchmark
. This tool runs inference with
your model and prints statistics such as the average QPS and min/max latency.
It uses input data that the tool generates or that you provide in a NumPy file.
We use MLPerf as the basis for benchmarking because it provides a set of standardized benchmarks that are designed to represent real-world ML workloads. For more information, read about the MLPerf Inference: Datacenter benchmark suite.
PyTorch models must be in TorchScript format.
Run a benchmark
To benchmark a model, pass the model path to the max benchmark
command.
This compiles the model, runs inference several times, and then prints the
benchmark results.
However, in order to generate inputs, the benchmark tool needs to know the model's input shapes—this information is included in ONNX models but not in TorchScript models. To benchmark a TorchScript model, you need to specify the input shapes with an input spec file.
For example, here's how you can benchmark a TorchScript model from our GitHub repo:
-
Download the PyTorch model and convert it to TorchScript:
cd max/examples/tools/common/resnet50-pytorch/
bash download-model.sh --output resnet50.torchscript
-
Then call the benchmark command and pass the local
input-spec.yaml
file:max benchmark resnet50.torchscript --input-data-schema input-spec.yaml
We'll explain this file more in the next section.
Alternatively, you can provide your own input data in a NumPy file, which means you don't need to specify the input spec.
Generate random input data
If your model does not include input shape metadata, the max benchmark
command complains that it cannot
determine the input shapes. In that case, you need to create a YAML file to
specify the shape for each input.
Also, even when your model does include shape metadata, if any shapes
include a dynamic size on a dimension other than the first dimension
(the batch size), you must specify static shapes with this YAML file. That is,
when max
reads the shape metadata and the first dimension is dynamic, it will
assume a batch size of 1
, but a dynamic shape on any other dimension will
produce an error and require that you define a static shape.
For example, you might define a model's inputs like this:
inputs:
- input_name: 1
shape: 1x3x224x224xf32
- input_name: 2
shape: 1x42x204xui32
data: # Optional data specification
random:
uniform:
min: 0
max: 6833
Then you can pass the file with the --input-data-schema
option:
max benchmark my_model --input-data-schema model_input_schema.yaml
The example schema above defines two inputs, specified by name (the
input_name
element) and input shape/type (the shape
element defines both
the tensor shape and data type).
Let’s look closer at the first input:
-
input_name: 1
declares the input name is1
. -
shape: 1x3x224x224xf32
declares the input shape is 4-dimensional. However, this model actually accepts a batch of 3-dimensional inputs (3x224x224). So, although the model accepts a dynamic batch size as the first dimension, we must specify a fixed batch size in this schema. We used1
as the batch size, but you can select whatever size you want. Finally, the data type is declared asf32
(32-bit float)—see the supported data types.
If your model supports dynamic shapes on other dimensions, that’s okay, but you must specify a static size for each dimension with this YAML file.
The second input (input_name: 2
) also defines the data
element, which
is optional. By default, the benchmark tool generates random data for all
inputs using some default strategies, but you can override these strategies
with the data
element (this element also allows you to specify your own
input data). For more information about how to
modify the randomization strategies, see the input data schema
reference.
If your model includes metadata about the input shapes, then any shape
information in the YAML file overrides that. For example, the YAML file might be
necessary if your model accepts dynamic tensor sizes, because the benchmark
tool does not support dynamic shapes (except on the first dimension, which gets
replaced with a 1
, unless you specify otherwise with the YAML file).
Also beware that, when you specify an input data schema, you must specify all
input tensors that you want to use in the benchmark. Even if you want to
override the shape for just one tensor, you must specify the name for all other
inputs—any unspecified inputs will not receive generated input. However, it is
sufficient to merely name the input, and the benchmark tool will generate the
appropriate input data based on the model’s shape metadata. For example, with a
YAML file like the following, the benchmark tool generates inputs for (only)
the image
and text
inputs using shape information provided by the model:
inputs:
- input_name: image
- input_name: text
TorchScript models do not include any metadata, so you must always specify
input information with the shape
element.
Provide your own input data
Instead of using generated inputs, you can provide your own inputs in a NumPy
file (.npy
) and specify that in your input data schema.
You can build each input with NumPy and save it to a file with
np.save()
.
Then, set the path to the NumPy file in the input data schema:
inputs:
- input_name: input
data:
file:
path: path/to/my/numpy_data.npy
Now just pass the YAML file to the benchmark
command:
max benchmark my_model --input-data-schema model_input_schema.yaml
You must provide a separate .npy
file for each input.
Specify the benchmark scenario
By default, the benchmark tool runs the MLPerf “single stream” scenario, which
sends individual inference requests consecutively. You can modify this with the
--mlperf-scenario
option.
For details about the available options, see the benchmark command reference.
Known issues
Out of range crash
If your model includes an input that has a limited range, such as an LLM that uses a token-lookup ID as input, the benchmark might crash because a randomly generated value is out of range for the token lookup.
To resolve this, you simply need to specify a max
value for the uniform
data type. For example:
inputs:
- input_name: input
shape: ?x42x204xui32
data:
random:
uniform:
max: 6833
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!
If you'd like to share more information, please report an issue on GitHub
😔 What went wrong?