For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo module
sparse_indexer_decode
Decode-path MiniMax-M3 sparse-attention (MSA) indexer (selection only).
Per decode query (one token per batch element) and index head, selects the top-k key blocks via two launches:
_decode_block_score_kernel-- block-maxq . k * sm_scalewith init/local forcing, split-K over the KV-block dimension; writes a caller-owned scores buffer._decode_topk_kernel--block_select_topkover each score row.
Both grids depend only on graph-constant shapes, never on sequence length, and
nothing is allocated inside the op, so the decode path is safe inside a
CUDA-graph capture region. M3 disables the index value/output, so
this emits block indices only (score type max).
Functionsβ
- β
sparse_indexer_decode: Compute MSA top-k block indices for a decode step (selection only). - β
sparse_indexer_decode_score: Launch the decode block-scoring kernel intoscore. - β
sparse_indexer_decode_topk: Launch the decode top-k selection kernel fromscoreintoout_idxs.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!