Docs
MAX provides a unified and extensible platform that includes everything you need to deploy low-latency, high-throughput AI inference pipelines into production.
MAX Serving
What you can do with MAX
Run an existing model from Python
Learn how to run inference using a model from PyTorch, TensorFlow, or ONNX.
Benchmark any model without any code
Use a simple command line tool to execute any model in MAX Engine with MLPerf.
Start an inference service in Triton
Try MAX Serving in a container and respond to inference requests from an HTTP/gRPC client.
Write Mojo code that uses Python
Learn how to write Mojo code that interoperates with Python packages like NumPy and Matplotlib.
Try Llama2 or Stable Diffusion
Check out our code examples that run inference with a variety of model.
Build an inference graph in Mojo
Learn how to build a high-performance inference graph in Mojo with the MAX Graph API.
Start coding with Mojo in your browser
Go to our Mojo coding playground that's built into this website. There's nothing to install.
Run an existing model from Python
Learn how to run inference using a model from PyTorch, TensorFlow, or ONNX.
Benchmark any model without any code
Use a simple command line tool to execute any model in MAX Engine with MLPerf.
Start an inference service in Triton
Try MAX Serving in a container and respond to inference requests from an HTTP/gRPC client.
Write Mojo code that uses Python
Learn how to write Mojo code that interoperates with Python packages like NumPy and Matplotlib.
Try Llama2 or Stable Diffusion
Check out our code examples that run inference with a variety of model.
Build an inference graph in Mojo
Learn how to build a high-performance inference graph in Mojo with the MAX Graph API.
Start coding with Mojo in your browser
Go to our Mojo coding playground that's built into this website. There's nothing to install.