Skip to main content

Mojo function

make_mma_swizzle

make_mma_swizzle[dtype: DType, MMA_M: Int, MMA_K: Int]() -> Swizzle

Create swizzle pattern for MMA LDS access.

AMD MI355X have 64 LDS banks × 4 bytes each. Without swizzling, the MMA thread access pattern causes 4-way bank conflicts. The swizzle XORs high-order address bits into the bank selection bits to distribute accesses across banks.

Swizzle parameters:

  • log_tile: Number of bits to XOR, scales with MMA_K
  • base: Log2 of read granularity in bytes (lds_frag_width * elem_size)
  • shift: Fixed at 4 for AMD LDS bank geometry

Configuration examples: BF16 16×16×32: lds_frag=8 bytes=16 → Swizzle(1, 4, 4) FP8 16×16×128: lds_frag=16 bytes=16 → Swizzle(3, 4, 4) FP8 32×32×64: lds_frag=32 bytes=32 → Swizzle(2, 5, 4)

Parameters:

  • dtype (DType): Element data type (affects byte size).
  • MMA_M (Int): M dimension of MMA instruction.
  • MMA_K (Int): K dimension of MMA instruction.

Returns:

Swizzle: Swizzle pattern for bank-conflict-free LDS access.

Was this page helpful?