IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

build_k2q_csr_device

def build_k2q_csr_device[topk: Int](q2k: UnsafePointer[Int32, MutAnyOrigin], cu_seqlens_q: UnsafePointer[Int32, MutAnyOrigin], cu_seqlens_k: UnsafePointer[Int32, MutAnyOrigin], row_ptr: UnsafePointer[Int32, MutAnyOrigin], qsplit_indices: UnsafePointer[Int32, MutAnyOrigin], scheduler_metadata: UnsafePointer[Int32, MutAnyOrigin], work_count: UnsafePointer[Int32, MutAnyOrigin], split_counts: UnsafePointer[Int32, MutAnyOrigin], row_map: UnsafePointer[Int32, MutAnyOrigin], row_coords: UnsafePointer[Int32, MutAnyOrigin], row_counts: UnsafePointer[Int32, MutAnyOrigin], tile_counts: UnsafePointer[Int32, MutAnyOrigin], head_kv: Int, total_q: Int, blk_kv: Int, max_seqlen_q: Int, sizes: K2qCsrDeviceSizes, ctx: DeviceContext, q_per_cta: Int = Int(128))

Builds the reverse-CSR + schedule on the device into caller buffers.

Emits the identical contract tensors as host build_k2q_csr for the same q2k: row_ptr byte-identical, qsplit_indices q-ascending within each row, scheduler_metadata/work_count, split_counts. The outputs and scratch must be pre-zeroed where the host pre-zeroes (row_counts, work_count); qsplit_indices must be -1 filled (unused tail); split_counts zero-filled. q_per_cta here is the fwd-CTA q-chunk cap (the scheduler chunking), distinct from the hist/scatter sizes.q_per_cta that tiles the q-range across CTAs.