For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

sparse_indexer_common

Shared device functions for the MiniMax-M3 sparse-attention (MSA) indexer.

The MSA indexer selects, per query and per index head, the top-k key blocks to attend to. This module holds the block-selection primitive shared by the prefill and decode indexer kernels; the per-block scoring (QK -> block-max -> local-block forcing) lives in the phase-specific kernels that produce the score buffer this primitive consumes.

block_select_topk is a cooperative, single-CTA-per-row top-k: one thread block selects the top-k block indices from one row of block scores, reusing the TopK_2 / _block_reduce_topk primitives from nn.topk via the same iterative max-extract as topk_gpu's stage 2. It is a straight-line re-scan with uniform control flow (so the block barriers are safe) and allocates no global scratch, which keeps it usable inside a CUDA-graph capture region. A register-heap fast path is a possible future optimization.

Functions

block_select_topk: Select the top-k block indices from one row of block scores.

Functions​

Functions