For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo package

gpu

GPU multi-head attention (MHA), cross-attention, and multi-head latent attention (MLA) kernels. Vendor-specific implementations live under amd/ and nvidia/.

Packages

amd_rdna: TileTensor-native attention kernels for AMD RDNA3+ (gfx11xx/gfx12xx).
amd_structured: TileTensor-native attention kernels for AMD gfx950 (MI355X).
apple: Apple (Metal) GPU attention kernels.
nvidia: NVIDIA GPU attention kernels and tile-scheduling utilities.

Modules

mha:
mha_cross:
mha_decode_partition_heuristic:
mla:
mla_decode_dispatch_scalars: Device-dispatched MLA decode dispatch-metadata scalars.
mla_graph:
mla_index_fp8: MLA FP8 index kernel for computing attention scores with paged KV cache.
sparse_indexer_common: Shared device functions for the MiniMax-M3 sparse-attention (MSA) indexer.

Packages​

Modules​

Packages

Modules