# MAX guides

> MAX developer guides covering deployment, inference, and model development.

This file contains links to documentation sections following the llmstxt.org standard.

## Table of Contents

- [What's new](https://docs.modular.com/max/changelog.md): Release notes for each version of the Modular Platform.
- [Using AI coding assistants](https://docs.modular.com/max/coding-assistants.md): Use AI coding assistants such as Cursor, Claude Code, Copilot, and Windsurf with Modular.
- [MAX container](https://docs.modular.com/max/container.md): Learn more about the provided Docker container for MAX deployment
- [Benchmark MAX on NVIDIA or AMD GPUs](https://docs.modular.com/max/deploy/benchmark.md): Learn how to use our benchmarking script to measure the performance of MAX
- [Cloud deployments with Modular](https://docs.modular.com/max/deploy/cloud.md): Deploy AI models at scale with Modular's fully managed cloud or inside your own VPC.
- [Deploy MAX on GPU with self-hosted endpoints](https://docs.modular.com/max/deploy/local-to-cloud.md): Learn how to deploy MAX pipelines to cloud
- [Basic operations](https://docs.modular.com/max/develop/basic-ops.md): Perform tensor operations for arithmetic, shape manipulation, reductions, and random generation
- [Broadcasting](https://docs.modular.com/max/develop/broadcasting.md): Learn broadcasting rules, automatic expansion in elementwise ops, and F.broadcast_to for explicit shapes
- [Build custom ops for GPUs](https://docs.modular.com/max/develop/build-custom-ops.md): Write custom GPU and CPU operations in Mojo and load them into a MAX graph.
- [Write hardware-agnostic custom ops for PyTorch](https://docs.modular.com/max/develop/custom-kernels-pytorch.md): Learn to write custom operators in Mojo for PyTorch
- [Optimize custom ops for GPUs with Mojo](https://docs.modular.com/max/develop/custom-ops-matmul.md): Learn to use Mojo's GPU programming abstractions to progressively optimize a matrix multiplication
- [Intro to custom ops](https://docs.modular.com/max/develop/custom-ops.md): Extend MAX Graph with custom Mojo kernels for optimized performance
- [Accuracy issues](https://docs.modular.com/max/develop/debugging-accuracy.md): Catch NaN and Inf values and uninitialized memory reads in your MAX model with accuracy debugging options.
- [Runtime errors](https://docs.modular.com/max/develop/debugging-errors.md): Capture Mojo stack traces and dump compiler IR to investigate unrecoverable failures in your MAX model.
- [GPU errors](https://docs.modular.com/max/develop/debugging-gpu.md): Force synchronous GPU execution and enable kernel-level bounds checking to pinpoint the op that caused a failure.
- [Execution tracing](https://docs.modular.com/max/develop/debugging-tracing.md): See which operations MAX runs, in what order, and on which device, and map each op back to its Python source.
- [Model debugging overview](https://docs.modular.com/max/develop/debugging.md): Review and configure available MAX debugging tools to diagnose issues during model development.
- [Data types (dtype)](https://docs.modular.com/max/develop/dtypes.md): Learn how to use dtypes to control tensor precision and interoperability
- [Eager execution](https://docs.modular.com/max/develop/eager-execution.md): Use the eager API to create tensors, apply operations, inspect intermediate values, and run a forward pass.
- [Graph overview](https://docs.modular.com/max/develop/graph.md): Learn why MAX uses graph compilation and how two programming patterns — explicit graph construction and eager-like execution — both produce graphs.
- [Model development overview](https://docs.modular.com/max/develop.md): A brief overview of the process to bring a pretrained model from Hugging Face to MAX.
- [Indexing and slicing](https://docs.modular.com/max/develop/indexing.md): Use slice syntax, F.gather, F.where, and F.scatter to read and write tensor values
- [Serve a fine-tuned model on a supported architecture](https://docs.modular.com/max/develop/max-pipeline-bring-your-own-model.md): Serve a fine-tuned model checkpoint with MAX when the base architecture is already supported, with no architecture code required.
- [Model bring-up workflow](https://docs.modular.com/max/develop/model-bringup-workflow.md): Understand MAX's model architecture system and how to implement a custom architecture by mapping config fields, translating weight names, connectin...
- [Build a model graph with Module](https://docs.modular.com/max/develop/modules.md): Learn how to compose modules, write custom modules with explicit weights, load checkpoint data, and construct a model graph.
- [Model pipeline](https://docs.modular.com/max/develop/pipelines.md): Learn how to integrate your model into a pipeline for serving with MAX.
- [Serve your custom model](https://docs.modular.com/max/develop/serve-custom-model-architectures.md): Learn how to serve a model with a custom architecture with max serve and send inference requests to it.
- [Tensor realization](https://docs.modular.com/max/develop/tensor-realization.md): Understand how MAX realizes tensors so you can reason about eager performance and write code the graph compiler can optimize.
- [Tensor fundamentals](https://docs.modular.com/max/develop/tensors.md): This page provides an overview of tensors, why they matter, and how to create and work with them in MAX.
- [Environment variables](https://docs.modular.com/max/environment-variables.md): Reference for all configurable environment variables in MAX
- [FAQ](https://docs.modular.com/max/faq.md): Answers to various questions about the Modular platform.
- [Quickstart](https://docs.modular.com/max/get-started.md): A quickstart guide to run a GenAI model locally with Modular.
- [GPU profiling with Nsight Systems](https://docs.modular.com/max/gpu-system-profiling.md): How to profile MAX models and endpoints with Nsight Systems.
- [Quantization](https://docs.modular.com/max/graph/quantize.md): An introduction to the MAX Graph quantization API
- [Embeddings](https://docs.modular.com/max/inference/embeddings.md): Learn how to use the MAX embeddings endpoint to create embeddings for input text
- [Image generation](https://docs.modular.com/max/inference/image-generation.md): Generate images from text prompts or transform existing images using the MAX v1/responses endpoint
- [Image and video to text](https://docs.modular.com/max/inference/image-to-text.md): Use the MAX chat completions endpoint with image or video input to generate descriptions and answer questions about visual content
- [Text to text](https://docs.modular.com/max/inference/text-to-text.md): Generate text using MAX with OpenAI-compatible chat and completion endpoints
- [Video generation](https://docs.modular.com/max/inference/video-generation.md): Generate videos from text prompts or animate existing images using the MAX v1/responses endpoint
- [What is Modular](https://docs.modular.com/max/intro.md): An overview of the Modular platform, what it does, and how to use it.
- [Supported models](https://docs.modular.com/max/models.md): See all the model architectures supported by MAX.
- [Packages](https://docs.modular.com/max/packages.md): Learn how to install Modular tools, set up your environment, and choose between nightly and stable versions
- [REST API](https://docs.modular.com/max/rest-api.md): The API reference for the MAX inference endpoint.
- [Function calling and tool use](https://docs.modular.com/max/serve/function-calling.md): Implement OpenAI-compatible function calling and tool use for agentic GenAI workflows
- [Using LoRA adapters](https://docs.modular.com/max/serve/lora-adapters.md): Use LoRA adapters with MAX to serve task-specific, fine-tuned variants of LLMs
- [Offline inference](https://docs.modular.com/max/serve/offline-inference.md): Run LLMs directly in Python for batch processing and high throughput
- [Prefix caching with PagedAttention](https://docs.modular.com/max/serve/prefix-caching.md): Use prefix caching and PagedAttention when serving a model with the MAX CLI
- [Speculative decoding](https://docs.modular.com/max/serve/speculative-decoding.md): Use speculative decoding to accelerate LLM inference
- [Structured output](https://docs.modular.com/max/serve/structured-output.md): Enable structured output with your GenAI deployments for predictable responses
- [Pixi basics](https://docs.modular.com/pixi.md): Pixi is a CLI tool [from Prefix.dev](https://prefix.dev/blog/launching_pixi)