Skip to main content

v24.1 (2024-02-29)

The first release of the MAX platform is here! 🚀

This is a preview version of the MAX platform. That means it is not ready for production deployment and designed only for local development and evaluation.

Because this is a preview, some API libraries are still in development and subject to change, and some features that we previously announced are not quite ready yet. But there is a lot that you can do in this release!

This release includes our flagship developer tools, currently for Linux only:

  • MAX Engine: Our state-of-the-art graph compiler and runtime library that executes models from PyTorch and ONNX, with incredible inference speed on a wide range of hardware.

    • API libraries in Python, C, and Mojo to run inference with your existing models. See the API references.

    • The max benchmark tool, which runs MLPerf benchmarks on any compatible model without writing any code.

    • The max visualize tool, which allows you to visualize your model in Netron after partially lowering in MAX Engine.

    • An early look at the MAX Graph API, our low-level library for building high-performance inference graphs.

  • MAX Serving: A preview of our serving wrapper for MAX Engine that provides full interoperability with existing AI serving systems (such as Triton) and that seamlessly deploys within existing container infrastructure (such as Kubernetes).

    • A Docker image that runs MAX Engine as a backend for NVIDIA Triton Inference Server.
  • Mojo: The world's first programming language built from the ground-up for AI developers, with cutting-edge compiler technology that delivers unparalleled performance and programmability for any hardware.

    • The latest version of Mojo, the standard library, and the mojo command line tool. These are always included in MAX, so you don't need to download any separate packages.

    • The Mojo changes in each release are often quite long, so we're going to continue sharing those in the existing Mojo changelog.

Additionally, we've started a new GitHub repo for MAX, where we currently share a bunch of code examples for our API libraries, including some large model pipelines. You can also use this repo to report issues with MAX.

Model Architecture Support​

  • Added support for the following model architectures:

    • OlmoForCausalLM (such as allenai/OLMo-1B-0724-hf)
    • GraniteForCausalLM (such as ibm-granite/granite-3.1-8b-instruct)
    • Phi3ForCausalLM (for Microsoft Phi-3 models)
    • Qwen2ForCausalLM (such as Qwen2 models)

    Example usage:

    max-pipelines generate \
      --model-path allenai/OLMo-1B-0724-hf \
      --prompt "Write bubble sort in mojo"
  • The max.pipelines.dataprocessing.tokenizer and max.pipelines.dataprocessing.gguf_utils modules have been removed.

  • The previously deprecated PipelineConfig.architecture field and its corresponding --architecture CLI argument have been removed.

max-pipelines CLI​

  • The --devices CLI argument now supports a comma-separated list of GPU IDs prefixed with gpu: like --devices=gpu:0,1,2,3. We no longer support the previous --devices=gpu-<N> format.

    max-pipelines generate --model-path=meta-llama/Llama-3.3-70B-Instruct \
      --quantization-encoding bfloat16 \
      --devices gpu:0,1,2,3 \
      --prompt="Design a self-sustaining colony on Neptune's moon Triton with a myth/science fusion name, three quantum tech breakthroughs, one ethical debate, a neon-lit cultural ritual, and a hidden flaw—presented in bullet points."
  • Removed --huggingface-repo-id PipelineConfig option and CLI argument in favor of --model-path.

  • Consolidated -model-path and -weight-path. If valid -weight-path(s) are provided, they'll now override --model-path, which in turn handles both local and remote (Hugging Face) cases. If we cannot derive the weights from the --weight-path(s), we'll now fall back to the --model-path, which has to be set explicitly by the user.

  • Added --huggingface-revision option, to allow selecting a non-default branch or a specific commit in a Hugging Face model repository.

Was this page helpful?