IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

k2q_csr_sizes

def k2q_csr_sizes(cu_seqlens_k: List[Int32], head_kv: Int, blk_kv: Int, max_seqlen_k: Int, total_q: Int, topk: Int, num_sms: Int, q_per_cta_chunk: Int = Int(128)) -> K2qCsrDeviceSizes

Returns the device-CSR sizing (matches the host builder's formulas).

num_sms (e.g. ctx.get_attribute(DeviceAttribute.MULTIPROCESSOR_COUNT)) sizes the multi-CTA hist/scatter grid. q_per_cta_chunk is the scheduler q-chunk cap (= the fwd CTA BM), distinct from the hist/scatter q_per_cta.

Returns:

K2qCsrDeviceSizes