IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

mla_decode_sparse_kv_bf16

SM100 (B200) sparse MLA decode kernel with BF16 KV cache.

K is loaded by a single BF16 + SWIZZLE_128B gather4 TMA covering the full 576-element row (tile_width=576, box_w=64). OffsetPosition[sparse=True] overrides num_keys with the sparse topk; double-buffered idx_bars and idx_smem pipe per-tile row indices from warp 11 (producer) to warp 8 (consumer) one tile ahead.

Supports NullMask and CausalMask. Sliding-window attention is FP8-only.

Structs