For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo module
mla_decode_sparse_kv_bf16
SM100 (B200) sparse MLA decode kernel with BF16 KV cache.
K is loaded by a single BF16 + SWIZZLE_128B gather4 TMA covering the full
576-element row (tile_width=576, box_w=64). OffsetPosition[sparse=True]
overrides num_keys with the sparse topk; double-buffered idx_bars and
idx_smem pipe per-tile row indices from warp 11 (producer) to warp 8
(consumer) one tile ahead.
Supports NullMask and CausalMask. Sliding-window attention is FP8-only.
Structs
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!