For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo package
gpu
GPU multi-head attention (MHA), cross-attention, and multi-head latent attention (MLA) kernels. Vendor-specific implementations live under amd/ and nvidia/.
Packagesβ
- β
amd_rdna: TileTensor-native attention kernels for AMD RDNA3+ (gfx11xx/gfx12xx). - β
amd_structured: TileTensor-native attention kernels for AMD gfx950 (MI355X). - β
apple: Apple (Metal) GPU attention kernels. - β
nvidia: NVIDIA GPU attention kernels and tile-scheduling utilities.
Modulesβ
- β
mha: - β
mha_cross: - β
mha_decode_partition_heuristic: - β
mla: - β
mla_decode_dispatch_scalars: Device-dispatched MLA decode dispatch-metadata scalars. - β
mla_graph: - β
mla_index_fp8: MLA FP8 index kernel for computing attention scores with paged KV cache. - β
sparse_indexer_common: Shared device functions for the MiniMax-M3 sparse-attention (MSA) indexer.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!