IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

sparse_indexer_decode

Decode-path MiniMax-M3 sparse-attention (MSA) indexer (selection only).

Per decode query (one token per batch element) and index head, selects the top-k key blocks via two launches:

  1. _decode_block_score_kernel -- block-max q . k * sm_scale with init/local forcing, split-K over the KV-block dimension; writes a caller-owned scores buffer.
  2. _decode_topk_kernel -- block_select_topk over each score row.

Both grids depend only on graph-constant shapes, never on sequence length, and nothing is allocated inside the op, so the decode path is safe inside a CUDA-graph capture region. M3 disables the index value/output, so this emits block indices only (score type max).

Functions​