Create a custom op for any model
MAX is designed to be fully extensible, so you can get the behavior you want from your model, without any compromises in performance. On this page, we’ll show you how to implement a new op for MAX Engine.
When you write a custom op in Mojo, the MAX Engine compiler treats it the same as all the other ops (”kernels”) that we’ve built into MAX Engine—the compiler will analyze and optimize the op to achieve the best performance.
Currently, custom ops are compatible with ONNX and MAX Graph models only. Support for TorchScript models is coming soon. (If you're using MAX Graph, instead see Create a custom op in MAX Graph).
Also, the example code below currently fails on Ubuntu 20.04.
Overview
When MAX compiles a TorchScript or ONNX model, it translates each op in the
graph into “MO” dialect operations. The MO dialect is an intermediate
representation we use in the our MLIR-based compiler
graph compiler (”MO” is for “Modular”). This means we’ve defined a lot of graph
operations in MO that map to other framework ops, such as add
, matmul
, and
softmax
. But there are a lot of ops out there (thousands of ops in PyTorch),
most of which are rarely used, and currently not implemented in MAX.
If you’re using a TorchScript model and the MAX compiler encounters one of these ops that we haven’t implemented, it falls back to using the op implementation from PyTorch. This is a good thing because it means we can provide full compatibility with nearly all PyTorch models, without any need for custom ops. (However, you will also be able to override PyTorch fallback ops with your own implementation, but this is not currently supported.)
On the other hand, if you’re using an ONNX model that uses an op we haven’t implemented yet, the compiler fails because ONNX models cannot fall back to the op implementation from ONNX. Fortunately, you can fix this yourself by implementing the op in Mojo, and that’s what you’ll learn to do on this page.
Setup
In the following sections, we’ll implement the
Det
op
because this is currently not supported in MAX Engine—any ONNX model
using this op fails to compile.
If you want to follow along with the example code below, first ensure that
you have installed the latest version of MAX.
Then navigate to the
examples/extensibility
path of the max
repo and install the required dependencies:
python3 -m pip install -r requirements.txt
Now, build the ONNX model with
onnx-model.py
:
python3 onnx-model.py
As is, this model fails to compile, which you can verify with the benchmark tool:
max benchmark onnx_det.onnx
loc("onnx_det.onnx":0:0): error: failed to legalize operation 'monnx.det_v11' that was explicitly marked illegal
So let’s make this model work by implementing the missing Det
op.
Implement a custom op
To create a custom op, write a Mojo function that operates on
Tensor
values: it must take a
Tensor
argument for each op input and return a Tensor
as the op output.
You must also register your function as an op by adding the
register.op()
decorator with
the name of your op.
For example, here’s an implementation of the Det
op written in Mojo (notice
the name we give to register.op()
is the op name from the compiler error
above):
from python import Python
from .python_utils import tensor_to_numpy, numpy_to_tensor
from max import register
from max.extensibility import Tensor, empty_tensor
@register.op("monnx.det_v11")
fn det[type: DType, rank: Int](x: Tensor[type, rank]) -> Tensor[type, rank - 2]:
try:
print("Hello, custom DET!")
var np = Python.import_module("numpy")
var np_array = tensor_to_numpy(x, np)
var np_out = np.linalg.det(np_array)
return numpy_to_tensor[type, rank - 2](np_out)
except e:
print(e)
return empty_tensor[type, rank - 2](0)
Although we could have written the determinant ("det") function entirely in
Mojo, we don't have to because Mojo allows us to instead use the NumPy
implementation.
The helper functions in the python_utils
module
convert NumPy arrays to/from Tensor
values.
Package the custom op
To package the custom op, create a directory that includes the above Mojo code,
plus an empty __init__.mojo
file. Then, pass that directory name to the
mojo package
command.
For example, here's how it looks with our directory named custom_ops
:
custom_ops
├── __init__.mojo
├── det.mojo
└── python_utils.mojo
Then, you can package it with this command:
mojo package custom_ops
This creates a file called custom_ops.mojopkg
in the current directory.
That's pretty much it. Next, you simply load your model with this Mojo package.
Benchmark with your custom op
You can verify the custom op works by passing it with your model to
the max benchmark
command (and because
we're using Python interop, we need to set the Python library for use in Mojo):
export MOJO_PYTHON_LIBRARY=$(modular config mojo-max.python_lib)
max benchmark onnx_det.onnx --custom-ops-path=custom_ops.mojopkg
Execute the model with your custom op
Finally, here’s how to load the model into MAX Engine with the custom op and
run an inference with our Python API. All
you need to do is add the custom_ops_path
argument when you load the model:
from max import engine
import numpy as np
session = engine.InferenceSession()
model = session.load("onnx_det.onnx", custom_ops_path="custom_ops.mojopkg")
for tensor in model.input_metadata:
print(f'name: {tensor.name}, shape: {tensor.shape}, dtype: {tensor.dtype}')
input_x = np.random.rand(3, 3, 5).astype(np.float32)
input_a = np.random.rand(5, 3).astype(np.float32)
input_b = np.random.rand(3).astype(np.float32)
result = model.execute(X=input_x, A=input_a, B=input_b)
print(result)
That’s it!
Now you can run it:
export MOJO_PYTHON_LIBRARY=$(modular config mojo-max.python_lib)
python3 onnx-inference.py
Compiling model...
Done!
name: X, shape: [3, 3, 5], dtype: DType.float32
name: A, shape: [5, 3], dtype: DType.float32
name: B, shape: [3], dtype: DType.float32
Hello, custom DET!
{'Z': array([-0.04415698, -0.00949615, 0.07051321], dtype=float32)}
The MOJO_PYTHON_LIBRARY
environment variable currently must be set to allow
for interop between Python and Mojo, but this is a rough
edge that we're working
to resolve and shouldn't be required in a future release.
You can get all the example code from this and previous sections from our GitHub repo.
Add your custom op to Triton (optional)
If you're using NVIDIA's Triton Inference Server to deploy your model, you can make your custom op available by appending the following to your model configuration file:
parameters: [{
key: "custom-ops-path"
value: {
string_value:"./path/to/your/custom_op/custom_ops.mojopkg"
}
}]
You must include the key-value pair inside the parameters
configuration, as
shown above. The only thing you need to change is the string_value
so it
specifies the path to your custom ops Mojo package.