For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo function
k2q_csr_sizes
def k2q_csr_sizes(cu_seqlens_k: List[Int32], head_kv: Int, blk_kv: Int, max_seqlen_k: Int, total_q: Int, topk: Int, num_sms: Int, q_per_cta_chunk: Int = Int(128)) -> K2qCsrDeviceSizes
Returns the device-CSR sizing (matches the host builder's formulas).
num_sms (e.g. ctx.get_attribute(DeviceAttribute.MULTIPROCESSOR_COUNT))
sizes the multi-CTA hist/scatter grid. q_per_cta_chunk is the scheduler
q-chunk cap (= the fwd CTA BM), distinct from the hist/scatter q_per_cta.
Returns:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!