For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo module
sparse_indexer_prefill
Prefill-path MiniMax-M3 sparse-attention (MSA) indexer.
For each (ragged) query token and index head, this selects the top-k key blocks to attend to. It runs as two launches:
_prefill_block_score_kernel-- one CTA per (query token, index head). The query token's causal key count isprefix_len + local_index + 1, so each thread takes the max over a block's in-range causal keys ofq . k * sm_scale(bf16 inputs, f32 accumulation), applies init/local forcing, and writes one f32 per block into a caller-owned score buffer. Clamping the key range to the causal count makes the diagonal (final) block exact without a separate mask._prefill_topk_kernel-- one CTA per (query token, index head). Selects the top-k blocks from the score row viablock_select_topk.
Queries are ragged: input_row_offsets[b] gives the start of batch b's tokens.
Selection-only (M3 disables the index value/output on every sparse layer); score
type is max.
Functionsβ
- β
sparse_indexer_prefill: Compute MSA top-k block indices for a prefill step (selection only). - β
sparse_indexer_prefill_score: Launch the prefill block-scoring kernel intoscore. - β
sparse_indexer_prefill_topk: Launch the prefill top-k selection kernel fromscoreintoout_idxs.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!