For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

build_uniform

def build_uniform(batch: Int, seq: Int32, num_q_heads: Int32 = Int32(16), available_tgs: Int32 = Int32(256)) -> PsMetadata

Build metadata for uniform-seqlen self-attention, FP8 causal MLA-prefill.

The bench/test shape: every sequence is seq tokens, causal self-attention, one latent KV head, num_q_heads query heads (MHA, gqa=1). One work-item is tile_q=256 TOKENS of ONE head (token-major BM=256); the KV split unit is tile_kv=128 blocks (= KV_BLOCK).

available_tgs = number of persistent thread-groups (= grid_dim.x = device CU count, 256 on MI355X). Set it to num_q_heads for a split-free partition at any seq (1 TG per head — useful for correctness gating).

Returns:

PsMetadata