Skip to main content

Benchmark a model

To better understand your model’s performance in MAX Engine, you can benchmark it with MLPerf scenarios using max benchmark. This tool runs inference with your model and prints statistics such as the average QPS and min/max latency. It uses input data that the tool generates or that you provide in a NumPy file.

We use MLPerf as the basis for benchmarking because it provides a set of standardized benchmarks that are designed to represent real-world ML workloads. For more information, read about the MLPerf Inference: Datacenter benchmark suite.

note

PyTorch models must be in TorchScript format.

Run a benchmark

To benchmark a model, pass the model path to the max benchmark command. This compiles the model, runs inference several times, and then prints the benchmark results.

However, in order to generate inputs, the benchmark tool needs to know the model's input shapes—this information is included in ONNX models but not in TorchScript models. To benchmark a TorchScript model, you need to specify the input shapes with an input spec file.

For example, here's how you can benchmark a TorchScript model from our GitHub repo:

  1. Download the PyTorch model and convert it to TorchScript:

    cd max/examples/tools/common/resnet50-pytorch/
    bash download-model.sh --output resnet50.torchscript
  2. Then call the benchmark command and pass the local input-spec.yaml file:

    max benchmark resnet50.torchscript --input-data-schema input-spec.yaml

We'll explain this file more in the next section.

Alternatively, you can provide your own input data in a NumPy file, which means you don't need to specify the input spec.

Generate random input data

If your model does not include input shape metadata, the max benchmark command complains that it cannot determine the input shapes. In that case, you need to create a YAML file to specify the shape for each input.

Also, even when your model does include shape metadata, if any shapes include a dynamic size on a dimension other than the first dimension (the batch size), you must specify static shapes with this YAML file. That is, when max reads the shape metadata and the first dimension is dynamic, it will assume a batch size of 1, but a dynamic shape on any other dimension will produce an error and require that you define a static shape.

For example, you might define a model's inputs like this:

model_input_schema.yaml
inputs:
- input_name: 1
shape: 1x3x224x224xf32
- input_name: 2
shape: 1x42x204xui32
data: # Optional data specification
random:
uniform:
min: 0
max: 6833

Then you can pass the file with the --input-data-schema option:

max benchmark my_model --input-data-schema model_input_schema.yaml

The example schema above defines two inputs, specified by name (the input_name element) and input shape/type (the shape element defines both the tensor shape and data type).

Let’s look closer at the first input:

  • input_name: 1 declares the input name is 1.

  • shape: 1x3x224x224xf32 declares the input shape is 4-dimensional. However, this model actually accepts a batch of 3-dimensional inputs (3x224x224). So, although the model accepts a dynamic batch size as the first dimension, we must specify a fixed batch size in this schema. We used 1 as the batch size, but you can select whatever size you want. Finally, the data type is declared as f32 (32-bit float)—see the supported data types.

note

If your model supports dynamic shapes on other dimensions, that’s okay, but you must specify a static size for each dimension with this YAML file.

The second input (input_name: 2) also defines the data element, which is optional. By default, the benchmark tool generates random data for all inputs using some default strategies, but you can override these strategies with the data element (this element also allows you to specify your own input data). For more information about how to modify the randomization strategies, see the input data schema reference.

If your model includes metadata about the input shapes, then any shape information in the YAML file overrides that. For example, the YAML file might be necessary if your model accepts dynamic tensor sizes, because the benchmark tool does not support dynamic shapes (except on the first dimension, which gets replaced with a 1, unless you specify otherwise with the YAML file).

Also beware that, when you specify an input data schema, you must specify all input tensors that you want to use in the benchmark. Even if you want to override the shape for just one tensor, you must specify the name for all other inputs—any unspecified inputs will not receive generated input. However, it is sufficient to merely name the input, and the benchmark tool will generate the appropriate input data based on the model’s shape metadata. For example, with a YAML file like the following, the benchmark tool generates inputs for (only) the image and text inputs using shape information provided by the model:

inputs:
- input_name: image
- input_name: text
note

TorchScript models do not include any metadata, so you must always specify input information with the shape element.

Provide your own input data

Instead of using generated inputs, you can provide your own inputs in a NumPy file (.npy) and specify that in your input data schema.

You can build each input with NumPy and save it to a file with np.save(). Then, set the path to the NumPy file in the input data schema:

model_input_schema.yaml
inputs:
- input_name: input
data:
file:
path: path/to/my/numpy_data.npy

Now just pass the YAML file to the benchmark command:

max benchmark my_model --input-data-schema model_input_schema.yaml
note

You must provide a separate .npy file for each input.

Specify the benchmark scenario

By default, the benchmark tool runs the MLPerf “single stream” scenario, which sends individual inference requests consecutively. You can modify this with the --mlperf-scenario option.

For details about the available options, see the benchmark command reference.

Known issues

Out of range crash

If your model includes an input that has a limited range, such as an LLM that uses a token-lookup ID as input, the benchmark might crash because a randomly generated value is out of range for the token lookup.

To resolve this, you simply need to specify a max value for the uniform data type. For example:

inputs:
- input_name: input
shape: ?x42x204xui32
data:
random:
uniform:
max: 6833

Was this page helpful?