IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

mha_prefill

RDNA Wave32 MHA prefill kernel.

Recipe (per KV tile):

  • K loaded to LDS strip-by-strip; QK MMA fragments emitted per strip.
  • V is prefetched as a side DMA during the second-to-last K strip so it overlaps the QK compute.
  • Mask + online softmax + barriers between QK and PV.
  • P (post-softmax scores) cast and staged in SMEM, then PV MMA reads P from SMEM as the A operand and V from LDS as B.