Skip to main content

Mojo module

softmax

Online softmax for RDNA Wave32 attention kernels.

Warp lane layout is col_major(16, 2) (lane_row = l % 16, lane_col = l // 16). Per-lane C/D fragment is row_major(1, 8) β€” 8 fp32 elements stored as a row vector. full() runs one online-softmax iteration end-to-end (max β†’ exp β†’ sum β†’ correction β†’ output update).

Structs​