Skip to main content

Mojo module

mask_op

Named TileOp struct for QK score masking on gfx950.

Uses TileLayout / Coord Layout Algebra (not hand-rolled integer formulas) to map lanes and registers into the MMA fragment space:

  • WarpLayoutT.idx2crd(lane) decomposes the lane into (lane_row, lane_col).
  • FragmentLayoutT(Idx[j]()) maps register j to its column offset within the MMA fragment.
  • FragmentLayoutT.static_product / static_shape[i] expose the fragment size and the per-lane column-group stride.

Fragment layout differs by MMA size:

  • 16x16 MMA: 4 regs/lane, flat (1, 4):(1, 1).
  • 32x32 MMA: 16 regs/lane organized as 4 groups of 4 cols with stride 8 between groups โ€” nested ((1,(4,4)):(1,(1,8))) (fp8 MFMA pattern).

Structsโ€‹

Was this page helpful?