Skip to main content

Mojo module

mla_prefill

MLA (Multi-Latent Attention) prefill kernel for gfx950.

Double-buffered MLA prefill with K_rope support. Uses TileTensor throughout — no LayoutTensor in the public or internal API.

Two-phase QK matmul per tile: Phase 1 (nope): Q[:,:depth] @ K^T Phase 2 (rope): Q[:,depth:q_depth] @ K_rope^T

Was this page helpful?