For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo function
build_ps_metadata
def build_ps_metadata(seqlens_qo_indptr: List[Int32], pages_kv_indptr: List[Int32], context_lens: List[Int32], num_heads_k: Int32, gqa_ratio: Int32, tile_q: Int32, tile_kv: Int32, block_size: Int32, is_causal: Bool, available_tgs: Int32) -> PsMetadata
Port of get_ps_metadata_v1_2_host (v1_2_host.cuh:265-314): the host wrapper that GCD-clusters heads across TGs, then calls the per-cluster kn_generate_ps_metadata and concatenates.
For MLA-prefill this is MHA (gqa_ratio==1, one head per work-item): the
work-item Q tile is qlen_granularity = tile_q // gqa_ratio TOKENS of ONE
head (token-major; the 256 MMA rows are 256 tokens, NOT 16 tok x 16 head),
and q_head_range's low 16 bits = the head index (= cluster_id).
Returns:
PsMetadata
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!