Mojo function
make_mma_swizzle
make_mma_swizzle[dtype: DType, MMA_M: Int, MMA_K: Int]() -> Swizzle
Create swizzle pattern for MMA LDS access.
AMD MI355X have 64 LDS banks × 4 bytes each. Without swizzling, the MMA thread access pattern causes 4-way bank conflicts. The swizzle XORs high-order address bits into the bank selection bits to distribute accesses across banks.
Swizzle parameters:
- log_tile: Number of bits to XOR, scales with MMA_K
- base: Log2 of read granularity in bytes (lds_frag_width * elem_size)
- shift: Fixed at 4 for AMD LDS bank geometry
Configuration examples: BF16 16×16×32: lds_frag=8 bytes=16 → Swizzle(1, 4, 4) FP8 16×16×128: lds_frag=16 bytes=16 → Swizzle(3, 4, 4) FP8 32×32×64: lds_frag=32 bytes=32 → Swizzle(2, 5, 4)
Parameters:
- dtype (
DType): Element data type (affects byte size). - MMA_M (
Int): M dimension of MMA instruction. - MMA_K (
Int): K dimension of MMA instruction.
Returns:
Swizzle: Swizzle pattern for bank-conflict-free LDS access.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!