For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

mla_prefill

MLA (Multi-Latent Attention) prefill kernel for gfx950.

Double-buffered MLA prefill with K_rope support. Uses TileTensor throughout — no LayoutTensor in the public or internal API.

Two-phase QK matmul per tile: Phase 1 (nope): Q[:,:depth] @ K^T Phase 2 (rope): Q[:,depth:q_depth] @ K_rope^T