For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

sparse_indexer_decode

Decode-path MiniMax-M3 sparse-attention (MSA) indexer (selection only).

Per decode query (one token per batch element) and index head, selects the top-k key blocks via two launches:

_decode_block_score_kernel -- block-max q . k * sm_scale with init/local forcing, split-K over the KV-block dimension; writes a caller-owned scores buffer.
_decode_topk_kernel -- block_select_topk over each score row.

Both grids depend only on graph-constant shapes, never on sequence length, and nothing is allocated inside the op, so the decode path is safe inside a CUDA-graph capture region. M3 disables the index value/output, so this emits block indices only (score type max).

Functions

sparse_indexer_decode: Compute MSA top-k block indices for a decode step (selection only).
sparse_indexer_decode_score: Launch the decode block-scoring kernel into score.
sparse_indexer_decode_topk: Launch the decode top-k selection kernel from score into out_idxs.

Functions​

Functions