IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

build_ps_metadata

def build_ps_metadata(seqlens_qo_indptr: List[Int32], pages_kv_indptr: List[Int32], context_lens: List[Int32], num_heads_k: Int32, gqa_ratio: Int32, tile_q: Int32, tile_kv: Int32, block_size: Int32, is_causal: Bool, available_tgs: Int32) -> PsMetadata

Port of get_ps_metadata_v1_2_host (v1_2_host.cuh:265-314): the host wrapper that GCD-clusters heads across TGs, then calls the per-cluster kn_generate_ps_metadata and concatenates.

For MLA-prefill this is MHA (gqa_ratio==1, one head per work-item): the work-item Q tile is qlen_granularity = tile_q // gqa_ratio TOKENS of ONE head (token-major; the 256 MMA rows are 256 tokens, NOT 16 tok x 16 head), and q_head_range's low 16 bits = the head index (= cluster_id).

Returns:

PsMetadata