Mojo module
mha_decode
RDNA Wave32 MHA decode kernel.
Same recipe as prefill, plus split-K partitioning of the KV span across blocks for grid-level parallelism.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!