For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo module
sparse_indexer_common
Shared device functions for the MiniMax-M3 sparse-attention (MSA) indexer.
The MSA indexer selects, per query and per index head, the top-k key blocks to attend to. This module holds the block-selection primitive shared by the prefill and decode indexer kernels; the per-block scoring (QK -> block-max -> local-block forcing) lives in the phase-specific kernels that produce the score buffer this primitive consumes.
block_select_topk is a cooperative, single-CTA-per-row top-k: one thread block
selects the top-k block indices from one row of block scores, reusing the
TopK_2 / _block_reduce_topk primitives from nn.topk via the same iterative
max-extract as topk_gpu's stage 2. It is a straight-line re-scan with uniform
control flow (so the block barriers are safe) and allocates no global scratch,
which keeps it usable inside a CUDA-graph capture region. A register-heap fast
path is a possible future optimization.
Functions
-
block_select_topk: Select the top-k block indices from one row of block scores.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!