For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

build_k2q_csr

def build_k2q_csr(q2k_indices: List[Int32], cu_seqlens_q: List[Int32], cu_seqlens_k: List[Int32], head_kv: Int, total_q: Int, topk: Int, blk_kv: Int, max_seqlen_q: Int, max_seqlen_k: Int, q_per_cta: Int = Int(128)) -> K2qCsr

Builds the reverse-CSR + schedule from the query-major selection.

q2k_indices[(h * total_q + g) * topk + t] is the batch-local KV-block id that global query token g's slot t selected for kv-head h, or < 0 if unused. Queries are packed by batch via cu_seqlens_q; batch-local q is g - cu_seqlens_q[b].

Inverts sequentially. Row numbering is level-major round-robin (block-0 of every active batch, then block-1, ...).

A non-empty row is q-chunked into ceil(row_count / q_per_cta) work items (q_per_cta <= BM, the fwd CTA's query cap) so rows selected by more than q_per_cta queries are served by multiple CTAs rather than truncated.

Note that topk <= 255 (qsplit packs split_slot in the high byte) and max_seqlen_q < 2^24 (qsplit packs q in the low 24 bits).

Returns:

K2qCsr