IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

mask_op

Named TileOp struct for QK score masking on gfx950.

Uses TileLayout / Coord Layout Algebra (not hand-rolled integer formulas) to map lanes and registers into the MMA fragment space:

  • WarpLayoutT.idx2crd(lane) decomposes the lane into (lane_row, lane_col).
  • FragmentLayoutT(Idx[j]) maps register j to its column offset within the MMA fragment.
  • FragmentLayoutT.static_product / static_shape[i] expose the fragment size and the per-lane column-group stride.

Fragment layout differs by MMA size:

  • 16x16 MMA: 4 regs/lane, flat (1, 4):(1, 1).
  • 32x32 MMA: 16 regs/lane organized as 4 groups of 4 cols with stride 8 between groups โ€” nested ((1,(4,4)):(1,(1,8))) (fp8 MFMA pattern).

Structsโ€‹