MAX changelog
The MAX platform is a unified set of tools and libraries that unlock performance, programmability, and portability for your AI inference pipeline. It includes several products, including MAX Engine, MAX Serving, and the Mojo programming language.
This page describes all the changes in each version of the MAX platform.
To learn more about the platform, read What is MAX.
v24.4 (2024-06-07)β
π₯ Legendaryβ
-
MAX is now available on macOS! Try it now.
-
New quantization APIs for MAX Graph. You can now build high-performance graphs in Mojo that use the latest quantization techniques, enabling even faster performance and more system compatibility for large models.
Learn more in the guide to quantize your graph weights.
βοΈ Newβ
MAX Mojo APIsβ
-
Added AI pipeline examples in the
max
repo, with Mojo implementations for common transformer layers, including quantization support.-
New Llama3 pipeline built with MAX Graph.
-
New Replit Code pipeline built with MAX Graph.
-
New TinyStories pipeline (based on TinyLlama) that offers a simple demo of the MAX Graph quantization API.
-
-
Added Mojo API inference example with the TorchScript BERT model.
-
Added
max.graph.checkpoint
package to save and load model weights.All weights are stored in a
TensorDict
. You can save and load aTensorDict
to disk withsave()
andload()
functions. -
Added MAX Graph quantization APIs:
- Added quantization encodings
BFloat16Encoding
,Q4_0Encoding
,Q4_KEncoding
, andQ6_KEncoding
. - Added the
QuantizationEncoding
trait so you can build custom quantization encodings. - Added
Graph.quantize()
to create a quantized tensor node. - Added
qmatmul()
to perform matrix-multiplication with a float32 and a quantized matrix.
- Added quantization encodings
-
Added some MAX Graph ops:
-
Added a
layer()
context manager andcurrent_layer()
function to aid in debugging during graph construction. For example:with graph.layer("foo"):
with graph.layer("bar"):
print(graph.current_layer()) # prints "foo.bar"
x = graph.constant[DType.int64](1)
graph.output(x)This adds a path
foo.bar
to the added nodes, which will be reported during errors. -
Added
format_system_stack()
function to format the stack trace, which we use to print better error messages fromerror()
. -
Added
TensorMap.keys()
to get all the tensor key names.
MAX C APIβ
Miscellaneous new APIs:
M_cloneCompileConfig()
M_copyAsyncTensorMap()
M_tensorMapKeys()
andM_deleteTensorMapKeys()
M_setTorchLibraries()
π¦ Changedβ
MAX Mojo APIβ
-
EngineNumpyView.data()
andEngineTensorView.data()
functions that return a type-erased pointer were renamed tounsafe_ptr()
. -
TensorMap
now conforms toCollectionElement
trait to be copyable and movable. -
custom_nv()
was removed, and its functionality moved intocustom()
as an function overload, so it can now output a list of tensor symbols.
For all the Mojo language and library changes in this release, see the Mojo changelog.
v24.3 (2024-05-02)β
π₯ Legendaryβ
-
You can now write custom ops for your models with Mojo!
Learn more about MAX extensibility.
π¦ Changedβ
-
Added support for named dynamic dimensions. This means you can specify when two or more dimensions in your model's input are dynamic but their sizes at run time must match each other. By specifying each of these dimension sizes with a name (instead of using
None
to indicate a dynamic size), the MAX Engine compiler can perform additional optimizations. See the notes below for the corresponding API changes that support named dimensions. -
Simplified all the APIs to load input specs for models, making them more consistent.
MAX Engine performanceβ
- Compared to v24.2, MAX Engine v24.3 shows an average speedup of 10% on PyTorch models, and an average 20% speedup on dynamically quantized ONNX transformers.
MAX Graph APIβ
The max.graph
APIs are still changing
rapidly, but starting to stabilize.
See the updated guide to build a graph with MAX Graph.
-
AnyMoType
renamed toType
,MOTensor
renamed toTensorType
, andMOList
renamed toListType
. -
Removed
ElementType
in favor of usingDType
. -
Removed
TypeTuple
in favor of usingList[Type]
. -
Removed the
Module
type so you can now start building a graph by directly instantiating aGraph
. -
Some new ops in
max.ops
, including support for custom ops.See how to create a custom op in MAX Graph.
MAX Engine Python APIβ
-
Redesigned
InferenceSession.load()
to replace the confusingoptions
argument with acustom_ops_path
argument for use when loading a custom op, and aninput_specs
argument for use when loading TorchScript models.As a result,
CommonLoadOptions
,TorchLoadOptions
, andTensorFlowLoadOptions
have all been removed. -
TorchInputSpec
now supports named dynamic dimensions (previously, dynamic dimension sizes could be specified only asNone
). This lets you tell MAX which dynamic dimensions are required to have the same size, which helps MAX better optimize your model.
MAX Engine Mojo APIβ
-
InferenceSession.load_model()
was renamed toload()
. -
Redesigned
InferenceSession.load()
to replace the confusingconfig
argument with acustom_ops_path
argument for use when loading a custom op, and aninput_specs
argument for use when loading TorchScript models.Doing so removed
LoadOptions
and introduced the newInputSpec
type to define the input shape/type of a model (instead ofLoadOptions
). -
New
ShapeElement
type to allow for named dynamic dimensions (inInputSpec
). -
max.engine.engine
module was renamed tomax.engine.info
.
MAX Engine C APIβ
M_newTorchInputSpec()
now supports named dynamic dimensions (via newdimNames
argument).
β Removedβ
-
Removed TensorFlow support in the MAX SDK, so you can no longer load a TensorFlow SavedModel for inference. However, TensorFlow is still available for enterprise customers.
We removed TensorFlow because industry-wide TensorFlow usage has declined significantly, especially for the latest AI innovations. Removing TensorFlow also cuts our package size by over 50% and accelerates the development of other customer-requested features. If you have a production use-case for a TensorFlow model, please contact us.
-
Removed the Python
CommonLoadOptions
,TorchLoadOptions
, andTensorFlowLoadOptions
classes. See note above aboutInferenceSession.load()
changes. -
Removed the Mojo
LoadOptions
type. See the note above aboutInferenceSession.load()
changes.
v24.2.1 (2024-04-11)β
-
You can now import more MAX Graph functions from
max.graph.ops
instead of usingmax.graph.ops.elementwise
. For example:from max.graph import ops
var relu = ops.relu(matmul)
v24.2 (2024-03-28)β
-
MAX Engine now supports TorchScript models with dynamic input shapes.
No matter what the input shapes are, you still need to specify the input specs for all TorchScript models.
-
The Mojo standard library is now open source!
Read more about it in this blog post.
-
And, of course, lots of Mojo updates, including implicit traits, support for keyword arguments in Python calls, a new
List
type (previouslyDynamicVector
), some refactoring that might break your code, and much more.For details, see the Mojo changelog.
v24.1.1 (2024-03-18)β
This is a minor release that improves error reports.
v24.1 (2024-02-29)β
The first release of the MAX platform is here! π
This is a preview version of the MAX platform. That means it is not ready for production deployment and designed only for local development and evaluation.
Because this is a preview, some API libraries are still in development and subject to change, and some features that we previously announced are not quite ready yet. But there is a lot that you can do in this release!
This release includes our flagship developer tools, currently for Linux only:
-
MAX Engine: Our state-of-the-art graph compiler and runtime library that executes models from PyTorch and ONNX, with incredible inference speed on a wide range of hardware.
-
API libraries in Python, C, and Mojo to run inference with your existing models. See the API references.
-
The
max benchmark
tool, which runs MLPerf benchmarks on any compatible model without writing any code. -
The
max visualize
tool, which allows you to visualize your model in Netron after partially lowering in MAX Engine. -
An early look at the MAX Graph API, our low-level library for building high-performance inference graphs in Mojo.
-
-
MAX Serving: A preview of our serving wrapper for MAX Engine that provides full interoperability with existing AI serving systems (such as Triton) and that seamlessly deploys within existing container infrastructure (such as Kubernetes).
- A Docker image that runs MAX Engine as a backend for NVIDIA Triton Inference Server. Try it now.
-
Mojo: The world's first programming language built from the ground-up for AI developers, with cutting-edge compiler technology that delivers unparalleled performance and programmability for any hardware.
-
The latest version of Mojo, the standard library, and the
mojo
command line tool. These are always included in MAX, so you don't need to download any separate packages. -
The Mojo changes in each release are often quite long, so we're going to continue sharing those in the existing Mojo changelog.
-
Additionally, we've started a new GitHub repo for MAX, where we currently share a bunch of code examples for our API libraries, including some large model pipelines such as Stable Diffusion in Mojo and Llama2 built with MAX Graph. You can also use this repo to report issues with MAX.
To get a peek at what's coming soon, and learn about some of the bugs we're working on right now, see the MAX roadmap & known issues.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!
If you'd like to share more information, please report an issue on GitHub
π What went wrong?