Skip to main content

Mojo package

amd_rdna

TileTensor-native attention kernels for AMD RDNA3+ (gfx11xx/gfx12xx).

Wave32 with 16x16x16 WMMA. 16-element A/B fragments per lane (full K), 8-element C/D fragments per lane. Supports MHA prefill and decode.

Modules​