IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

build_k2q_csr

def build_k2q_csr(q2k_indices: List[Int32], cu_seqlens_q: List[Int32], cu_seqlens_k: List[Int32], head_kv: Int, total_q: Int, topk: Int, blk_kv: Int, max_seqlen_q: Int, max_seqlen_k: Int, q_per_cta: Int = Int(128)) -> K2qCsr

Builds the reverse-CSR + schedule from the query-major selection.

q2k_indices[(h * total_q + g) * topk + t] is the batch-local KV-block id that global query token g's slot t selected for kv-head h, or < 0 if unused. Queries are packed by batch via cu_seqlens_q; batch-local q is g - cu_seqlens_q[b].

Inverts sequentially. Row numbering is level-major round-robin (block-0 of every active batch, then block-1, ...).

A non-empty row is q-chunked into ceil(row_count / q_per_cta) work items (q_per_cta <= BM, the fwd CTA's query cap) so rows selected by more than q_per_cta queries are served by multiple CTAs rather than truncated.

Note that topk <= 255 (qsplit packs split_slot in the high byte) and max_seqlen_q < 2^24 (qsplit packs q in the low 24 bits).

Returns:

K2qCsr