Skip to main content

Mojo module

mma

RDNA Wave32 WMMA helper for attention kernels.

Wraps the raw mma intrinsic in a parametric per-fragment loop. RDNA WMMA always uses 16-element A/B fragments and 8-element C/D fragments per lane (16x16x16, group_size=1).

Functions

  • rdna_mma: Per-fragment WMMA loop. Derives MMA counts from operand shapes; accumulator indexing is col-major over (M, N): c_idx = m + n*num_m.