For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

sparse_indexer_prefill

Prefill-path MiniMax-M3 sparse-attention (MSA) indexer.

For each (ragged) query token and index head, this selects the top-k key blocks to attend to. It runs as two launches:

_prefill_block_score_kernel -- one CTA per (query token, index head). The query token's causal key count is prefix_len + local_index + 1, so each thread takes the max over a block's in-range causal keys of q . k * sm_scale (bf16 inputs, f32 accumulation), applies init/local forcing, and writes one f32 per block into a caller-owned score buffer. Clamping the key range to the causal count makes the diagonal (final) block exact without a separate mask.
_prefill_topk_kernel -- one CTA per (query token, index head). Selects the top-k blocks from the score row via block_select_topk.

Queries are ragged: input_row_offsets[b] gives the start of batch b's tokens. Selection-only (M3 disables the index value/output on every sparse layer); score type is max.

Functions

sparse_indexer_prefill: Compute MSA top-k block indices for a prefill step (selection only).
sparse_indexer_prefill_score: Launch the prefill block-scoring kernel into score.
sparse_indexer_prefill_topk: Launch the prefill top-k selection kernel from score into out_idxs.

Functions​

Functions