Mojo module
mla_prefill
MLA (Multi-Latent Attention) prefill kernel for gfx950.
Double-buffered MLA prefill with K_rope support. Uses TileTensor throughout — no LayoutTensor in the public or internal API.
Two-phase QK matmul per tile: Phase 1 (nope): Q[:,:depth] @ K^T Phase 2 (rope): Q[:,depth:q_depth] @ K_rope^T
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!