Mojo module

mla_prefill

MLA (Multi-Latent Attention) prefill kernel for gfx950.

Double-buffered MLA prefill with K_rope support. Uses TileTensor throughout — no LayoutTensor in the public or internal API.

Two-phase QK matmul per tile: Phase 1 (nope): Q[:,:depth] @ K^T Phase 2 (rope): Q[:,depth:q_depth] @ K_rope^T

View source

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!