Create a custom op for any model

MAX is designed to be fully extensible, so you can get the behavior you want from your model, without any compromises in performance. On this page, we’ll show you how to implement a new op for MAX Engine.

When you write a custom op in Mojo, the MAX Engine compiler treats it the same as all the other ops (”kernels”) that we’ve built into MAX Engine—the compiler will analyze and optimize the op to achieve the best performance.

Limitations

Currently, custom ops are compatible with ONNX and MAX Graph models only. Support for TorchScript models is coming soon. (If you're using MAX Graph, instead see Create a custom op in MAX Graph).

Also, the example code below currently fails on Ubuntu 20.04.

Overview

When MAX compiles a TorchScript or ONNX model, it translates each op in the graph into “MO” dialect operations. The MO dialect is an intermediate representation we use in the our MLIR-based compiler graph compiler (”MO” is for “Modular”). This means we’ve defined a lot of graph operations in MO that map to other framework ops, such as add, matmul, and softmax. But there are a lot of ops out there (thousands of ops in PyTorch), most of which are rarely used, and currently not implemented in MAX.

If you’re using a TorchScript model and the MAX compiler encounters one of these ops that we haven’t implemented, it falls back to using the op implementation from PyTorch. This is a good thing because it means we can provide full compatibility with nearly all PyTorch models, without any need for custom ops. (However, you will also be able to override PyTorch fallback ops with your own implementation, but this is not currently supported.)

On the other hand, if you’re using an ONNX model that uses an op we haven’t implemented yet, the compiler fails because ONNX models cannot fall back to the op implementation from ONNX. Fortunately, you can fix this yourself by implementing the op in Mojo, and that’s what you’ll learn to do on this page.

Setup

In the following sections, we’ll implement the Det op because this is currently not supported in MAX Engine—any ONNX model using this op fails to compile.

If you want to follow along with the example code below, first ensure that you have installed the latest version of MAX. Then navigate to the examples/extensibility path of the max repo and install the required dependencies:

python3 -m pip install -r requirements.txt

Now, build the ONNX model with onnx-model.py:

python3 onnx-model.py

As is, this model fails to compile, which you can verify with the benchmark tool:

max benchmark onnx_det.onnx

loc("onnx_det.onnx":0:0): error: failed to legalize operation 'monnx.det_v11' that was explicitly marked illegal

So let’s make this model work by implementing the missing Det op.

Implement a custom op

To create a custom op, write a Mojo function that operates on Tensor values: it must take a Tensor argument for each op input and return a Tensor as the op output.

You must also register your function as an op by adding the register.op() decorator with the name of your op.

For example, here’s an implementation of the Det op written in Mojo (notice the name we give to register.op() is the op name from the compiler error above):

det.mojo
from python import Python
from .python_utils import tensor_to_numpy, numpy_to_tensor
from max import register
from max.extensibility import Tensor, empty_tensor

@register.op("monnx.det_v11")
fn det[type: DType, rank: Int](x: Tensor[type, rank]) -> Tensor[type, rank - 2]:
    try:
        print("Hello, custom DET!")
        var np = Python.import_module("numpy")
        var np_array = tensor_to_numpy(x, np)
        var np_out = np.linalg.det(np_array)
        return numpy_to_tensor[type, rank - 2](np_out)
    except e:
        print(e)
    return empty_tensor[type, rank - 2](0)

note

Although we could have written the determinant ("det") function entirely in Mojo, we don't have to because Mojo allows us to instead use the NumPy implementation. The helper functions in the python_utils module convert NumPy arrays to/from Tensor values.

Package the custom op

To package the custom op, create a directory that includes the above Mojo code, plus an empty __init__.mojo file. Then, pass that directory name to the mojo package command.

For example, here's how it looks with our directory named custom_ops:

custom_ops
├── __init__.mojo
├── det.mojo
└── python_utils.mojo

Then, you can package it with this command:

mojo package custom_ops

This creates a file called custom_ops.mojopkg in the current directory.

That's pretty much it. Next, you simply load your model with this Mojo package.

Benchmark with your custom op

You can verify the custom op works by passing it with your model to the max benchmark command (and because we're using Python interop, we need to set the Python library for use in Mojo):

export MOJO_PYTHON_LIBRARY=$(modular config mojo-max.python_lib)

max benchmark onnx_det.onnx --custom-ops-path=custom_ops.mojopkg

Execute the model with your custom op

Finally, here’s how to load the model into MAX Engine with the custom op and run an inference with our Python API. All you need to do is add the custom_ops_path argument when you load the model:

onnx-inference.py
from max import engine
import numpy as np

session = engine.InferenceSession()
model = session.load("onnx_det.onnx", custom_ops_path="custom_ops.mojopkg")

for tensor in model.input_metadata:
    print(f'name: {tensor.name}, shape: {tensor.shape}, dtype: {tensor.dtype}')

input_x = np.random.rand(3, 3, 5).astype(np.float32)
input_a = np.random.rand(5, 3).astype(np.float32)
input_b = np.random.rand(3).astype(np.float32)

result = model.execute(X=input_x, A=input_a, B=input_b)
print(result)

That’s it!

Now you can run it:

export MOJO_PYTHON_LIBRARY=$(modular config mojo-max.python_lib)

python3 onnx-inference.py

Compiling model...
Done!
name: X, shape: [3, 3, 5], dtype: DType.float32
name: A, shape: [5, 3], dtype: DType.float32
name: B, shape: [3], dtype: DType.float32
Hello, custom DET!
{'Z': array([-0.04415698, -0.00949615,  0.07051321], dtype=float32)}

note

The MOJO_PYTHON_LIBRARY environment variable currently must be set to allow for interop between Python and Mojo, but this is a rough edge that we're working to resolve and shouldn't be required in a future release.

You can get all the example code from this and previous sections from our GitHub repo.

Add your custom op to Triton (optional)

If you're using NVIDIA's Triton Inference Server to deploy your model, you can make your custom op available by appending the following to your model configuration file:

parameters: [{
  key: "custom-ops-path"
  value: {
    string_value:"./path/to/your/custom_op/custom_ops.mojopkg"
  }
}]

You must include the key-value pair inside the parameters configuration, as shown above. The only thing you need to change is the string_value so it specifies the path to your custom ops Mojo package.

Overview​

Setup​

Implement a custom op​

Package the custom op​

Benchmark with your custom op​

Execute the model with your custom op​

Add your custom op to Triton (optional)​