Mojo package
amd_rdna
TileTensor-native attention kernels for AMD RDNA3+ (gfx11xx/gfx12xx).
Wave32 with 16x16x16 WMMA. 16-element A/B fragments per lane (full K), 8-element C/D fragments per lane. Supports MHA prefill and decode.
Modulesβ
- β
attention: Attention struct for RDNA Wave32 MHA kernels (prefill + decode). - β
buffers: K, V, Q, P, and Output buffers for RDNA Wave32 attention kernels. - β
config: RDNA Wave32 attention config. - β
mha_decode: RDNA Wave32 MHA decode kernel. - β
mha_prefill: RDNA Wave32 MHA prefill kernel. - β
mma: RDNA Wave32 WMMA helper for attention kernels. - β
softmax: Online softmax for RDNA Wave32 attention kernels. - β
utils: Shared helpers for RDNA Wave32 attention kernels.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!