For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo package
amd_structured
TileTensor-native attention kernels for AMD gfx950 (MI355X).
This module provides gfx950-only attention implementation using TileTensor throughout. Supports MHA prefill (depth=64, 128, 256, 512), MHA decode (token generation), MLA prefill, and MLA decode.
Modulesβ
- β
attention: Attention struct for gfx950 MHA/MLA kernels (prefill + decode). - β
buffers: Q, P, and Output register buffers for gfx950 attention kernels. - β
config: GFX950 attention config. - β
iglp: IGroupLPsched_group_barrieraggregate-pair helpers for AMD MHA. - β
kv_buffer: KV cache buffers for MHA/MLA prefill and decode kernels. - β
mask_op: Named TileOp struct for QK score masking on gfx950. - β
mha_decode: Gfx950 MHA decode kernel, built on KVBuffer. - β
mha_decode_streaming: MHA streaming decode kernel for gfx950. - β
mha_mask: Mask functor application for theMhaPrefillV2att_block. - β
mha_mma_op: MHA MMA operator: shape constants, SMEMβregister loaders, and MFMA dispatch used byMhaPrefillV2. - β
mha_prefill: Unified gfx950 MHA prefill kernel. - β
mha_prefill_v2: MhaPrefillV2 β long-context BF16 MHA prefill for AMD MI355X (gfx950). - β
mha_softmax: Online softmax row-state bundle forMhaPrefillV2. - β
mla_components: MLA-prefill math components for AMD MI355X (gfx950). - β
mla_decode: Gfx950 MLA (Multi-Latent Attention) decode kernel, built on KVBuffer. - β
mla_prefill: MLA (Multi-Latent Attention) prefill kernel for gfx950. - β
mla_prefill_v2: MlaPrefillV2 β fresh, from-scratch port of the reference MLA-prefill INTEGRATED inner-loop architecture for AMD MI355X (gfx950). - β
mma: - β
ps_metadata: - β
softmax: Online softmax for gfx950 attention kernels. - β
utils: Shared helpers for gfx950 attention kernels.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!