Mojo module
mask_op
Named TileOp struct for QK score masking on gfx950.
Uses TileLayout / Coord Layout Algebra (not hand-rolled integer formulas)
to map lanes and registers into the MMA fragment space:
WarpLayoutT.idx2crd(lane)decomposes the lane into (lane_row, lane_col).FragmentLayoutT(Idx[j]())maps registerjto its column offset within the MMA fragment.FragmentLayoutT.static_product/static_shape[i]expose the fragment size and the per-lane column-group stride.
Fragment layout differs by MMA size:
- 16x16 MMA: 4 regs/lane, flat
(1, 4):(1, 1). - 32x32 MMA: 16 regs/lane organized as 4 groups of 4 cols with stride 8
between groups โ nested
((1,(4,4)):(1,(1,8)))(fp8 MFMA pattern).
Structsโ
- โ
MaskTileOp: Apply anMHAMaskto the per-lane QK score registers in place.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!