IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

build_uniform

def build_uniform(batch: Int, seq: Int32, num_q_heads: Int32 = Int32(16), available_tgs: Int32 = Int32(256)) -> PsMetadata

Build metadata for uniform-seqlen self-attention, FP8 causal MLA-prefill.

The bench/test shape: every sequence is seq tokens, causal self-attention, one latent KV head, num_q_heads query heads (MHA, gqa=1). One work-item is tile_q=256 TOKENS of ONE head (token-major BM=256); the KV split unit is tile_kv=128 blocks (= KV_BLOCK).

available_tgs = number of persistent thread-groups (= grid_dim.x = device CU count, 256 on MI355X). Set it to num_q_heads for a split-free partition at any seq (1 TG per head — useful for correctness gating).

Returns:

PsMetadata