For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo package

amd_structured

TileTensor-native attention kernels for AMD gfx950 (MI355X).

This module provides gfx950-only attention implementation using TileTensor throughout. Supports MHA prefill (depth=64, 128, 256, 512), MHA decode (token generation), MLA prefill, and MLA decode.

Modules

attention: Attention struct for gfx950 MHA/MLA kernels (prefill + decode).
buffers: Q, P, and Output register buffers for gfx950 attention kernels.
config: GFX950 attention config.
iglp: IGroupLP sched_group_barrier aggregate-pair helpers for AMD MHA.
kv_buffer: KV cache buffers for MHA/MLA prefill and decode kernels.
mask_op: Named TileOp struct for QK score masking on gfx950.
mha_decode: Gfx950 MHA decode kernel, built on KVBuffer.
mha_decode_streaming: MHA streaming decode kernel for gfx950.
mha_mask: Mask functor application for the MhaPrefillV2 att_block.
mha_mma_op: MHA MMA operator: shape constants, SMEM→register loaders, and MFMA dispatch used by MhaPrefillV2.
mha_prefill: Unified gfx950 MHA prefill kernel.
mha_prefill_v2: MhaPrefillV2 — long-context BF16 MHA prefill for AMD MI355X (gfx950).
mha_softmax: Online softmax row-state bundle for MhaPrefillV2.
mla_components: MLA-prefill math components for AMD MI355X (gfx950).
mla_decode: Gfx950 MLA (Multi-Latent Attention) decode kernel, built on KVBuffer.
mla_prefill: MLA (Multi-Latent Attention) prefill kernel for gfx950.
mla_prefill_v2: MlaPrefillV2 — fresh, from-scratch port of the reference MLA-prefill INTEGRATED inner-loop architecture for AMD MI355X (gfx950).
mma:
ps_metadata:
softmax: Online softmax for gfx950 attention kernels.
utils: Shared helpers for gfx950 attention kernels.

Modules​

Modules