For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

build_ps_metadata

def build_ps_metadata(seqlens_qo_indptr: List[Int32], pages_kv_indptr: List[Int32], context_lens: List[Int32], num_heads_k: Int32, gqa_ratio: Int32, tile_q: Int32, tile_kv: Int32, block_size: Int32, is_causal: Bool, available_tgs: Int32) -> PsMetadata

Port of get_ps_metadata_v1_2_host (v1_2_host.cuh:265-314): the host wrapper that GCD-clusters heads across TGs, then calls the per-cluster kn_generate_ps_metadata and concatenates.

For MLA-prefill this is MHA (gqa_ratio==1, one head per work-item): the work-item Q tile is qlen_granularity = tile_q // gqa_ratio TOKENS of ONE head (token-major; the 256 MMA rows are 256 tokens, NOT 16 tok x 16 head), and q_head_range's low 16 bits = the head index (= cluster_id).

Returns:

PsMetadata