For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo function
build_uniform
def build_uniform(batch: Int, seq: Int32, num_q_heads: Int32 = Int32(16), available_tgs: Int32 = Int32(256)) -> PsMetadata
Build metadata for uniform-seqlen self-attention, FP8 causal MLA-prefill.
The bench/test shape: every sequence is seq tokens, causal self-attention,
one latent KV head, num_q_heads query heads (MHA, gqa=1). One work-item is
tile_q=256 TOKENS of ONE head (token-major BM=256); the KV split unit is
tile_kv=128 blocks (= KV_BLOCK).
available_tgs = number of persistent thread-groups (= grid_dim.x = device
CU count, 256 on MI355X). Set it to num_q_heads for a split-free partition
at any seq (1 TG per head — useful for correctness gating).
Returns:
PsMetadata
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!