IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

sparse_indexer_common

Shared device functions for the MiniMax-M3 sparse-attention (MSA) indexer.

The MSA indexer selects, per query and per index head, the top-k key blocks to attend to. This module holds the block-selection primitive shared by the prefill and decode indexer kernels; the per-block scoring (QK -> block-max -> local-block forcing) lives in the phase-specific kernels that produce the score buffer this primitive consumes.

block_select_topk is a cooperative, single-CTA-per-row top-k: one thread block selects the top-k block indices from one row of block scores, reusing the TopK_2 / _block_reduce_topk primitives from nn.topk via the same iterative max-extract as topk_gpu's stage 2. It is a straight-line re-scan with uniform control flow (so the block barriers are safe) and allocates no global scratch, which keeps it usable inside a CUDA-graph capture region. A register-heap fast path is a possible future optimization.

Functions