Intro to MAX extensibility

The AI model you get from a framework like PyTorch or TensorFlow is built as a graph of connected operations ("ops"). Although most ops are simple math functions, efficiently executing models that include trillions of ops requires highly-performant implementations for each op (sometimes called "kernels"). However, even the most high-performance ops aren't enough to achieve peak performance. It's also necessary to employ graph compilers that can analyze the entire graph and optimize the calculations and memory that span across a sequence of ops.

That's why MAX Engine is designed to be fully extensible with Mojo. Regardless of the model format you have (such as PyTorch, ONNX, or MAX Graph), you can write custom ops in Mojo that the MAX Engine compiler can natively analyze and optimize along with the rest of the model.

New API coming soon

We removed the extensibility API in v24.5 (it was added in v24.3) and we're replacing it with a better version soon. Because MAX is still a preview, we don't want to leave APIs in the platform that we have no intention to support. Stay tuned for an improved extensibility API that works on both CPUs and GPUs.