Mojo module
mma
RDNA Wave32 WMMA helper for attention kernels.
Wraps the raw mma intrinsic in a parametric per-fragment loop. RDNA
WMMA always uses 16-element A/B fragments and 8-element C/D fragments
per lane (16x16x16, group_size=1).
Functions
-
rdna_mma: Per-fragment WMMA loop. Derives MMA counts from operand shapes; accumulator indexing is col-major over (M, N): c_idx = m + n*num_m.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!