Skip to main content

Mojo module

mha_decode

RDNA Wave32 MHA decode kernel.

Same recipe as prefill, plus split-K partitioning of the KV span across blocks for grid-level parallelism.