For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo function
build_k2q_csr
def build_k2q_csr(q2k_indices: List[Int32], cu_seqlens_q: List[Int32], cu_seqlens_k: List[Int32], head_kv: Int, total_q: Int, topk: Int, blk_kv: Int, max_seqlen_q: Int, max_seqlen_k: Int, q_per_cta: Int = Int(128)) -> K2qCsr
Builds the reverse-CSR + schedule from the query-major selection.
q2k_indices[(h * total_q + g) * topk + t] is the batch-local KV-block id
that global query token g's slot t selected for kv-head h, or < 0 if
unused. Queries are packed by batch via cu_seqlens_q; batch-local q is
g - cu_seqlens_q[b].
Inverts sequentially. Row numbering is level-major round-robin (block-0 of every active batch, then block-1, ...).
A non-empty row is q-chunked into ceil(row_count / q_per_cta) work items
(q_per_cta <= BM, the fwd CTA's query cap) so rows selected by more than
q_per_cta queries are served by multiple CTAs rather than truncated.
Note that topk <= 255 (qsplit packs split_slot in the high byte) and
max_seqlen_q < 2^24 (qsplit packs q in the low 24 bits).
Returns:
K2qCsr
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!