IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

sparse_indexer_prefill

Prefill-path MiniMax-M3 sparse-attention (MSA) indexer.

For each (ragged) query token and index head, this selects the top-k key blocks to attend to. It runs as two launches:

  1. _prefill_block_score_kernel -- one CTA per (query token, index head). The query token's causal key count is prefix_len + local_index + 1, so each thread takes the max over a block's in-range causal keys of q . k * sm_scale (bf16 inputs, f32 accumulation), applies init/local forcing, and writes one f32 per block into a caller-owned score buffer. Clamping the key range to the causal count makes the diagonal (final) block exact without a separate mask.
  2. _prefill_topk_kernel -- one CTA per (query token, index head). Selects the top-k blocks from the score row via block_select_topk.

Queries are ragged: input_row_offsets[b] gives the start of batch b's tokens. Selection-only (M3 disables the index value/output on every sparse layer); score type is max.

Functions​