Mojo module
softmax
Online softmax for RDNA Wave32 attention kernels.
Warp lane layout is col_major(16, 2) (lane_row = l % 16,
lane_col = l // 16). Per-lane C/D fragment is row_major(1, 8) β 8
fp32 elements stored as a row vector. full() runs one online-softmax
iteration end-to-end (max β exp β sum β correction β output update).
Structsβ
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!