IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

mha_decode_streaming

MHA streaming decode kernel for gfx950.

Per-tile loop: K strips from DRAM→LDS→REG for QK MMA, P scores through SMEM for PV MMA, split-K partitioning.

Uses DecodeStreamingKVBuffer for single-buffer, per-strip DRAM→SMEM staging (no KVCacheIterator — strips are sub-tiled from an external DRAM tile).