For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

mla_decode_sparse_kv_bf16

SM100 (B200) sparse MLA decode kernel with BF16 KV cache.

K is loaded by a single BF16 + SWIZZLE_128B gather4 TMA covering the full 576-element row (tile_width=576, box_w=64). OffsetPosition[sparse=True] overrides num_keys with the sparse topk; double-buffered idx_bars and idx_smem pipe per-tile row indices from warp 11 (producer) to warp 8 (consumer) one tile ahead.

Supports NullMask and CausalMask. Sliding-window attention is FP8-only.

Structs

MLA_SM100_Decode_Sparse_KV_BF16:

Structs​

Structs