Mojo module
tile_writer
TileWriter module for efficient tile writing in GPU matrix multiplication.
This module provides utilities for writing tiles to memory using different mechanisms and destinations:
-
Register → Shared Memory: Uses st.matrix hardware instruction for efficient storage of WGMMA accumulator results to shared memory with swizzling.
-
Register → Global Memory: Direct stores from register tiles to global memory with optional epilogue processing and bounds checking.
-
Shared Memory → Global Memory: Hardware-accelerated TMA stores or regular stores for efficient 2D tile transfers from shared to global memory.
Two main traits abstract these writing mechanisms:
- TileWriter: For shared memory → global memory transfers
- RegTileWriter: For register → memory (shared or global) transfers
Structs
-
FragmentToSMemWriter: Writes WGMMA accumulator results from registers to shared memory using st.matrix. -
RegisterToGMemWriter: Writer for transferring accumulator registers directly to global memory. -
ThreadInfo: Thread identification within the warp group. -
TileCoordinates: Helper struct for managing tile coordinate offsets. -
TileWriterThreadwise: -
TileWriterTMA: TMA-based tile writer for hardware-accelerated memory transfers.
Traits
-
RegTileWriter: Base trait for tile writing mechanisms in matrix multiplication. -
SMemTileWriter: Base trait for tile writing mechanisms in matrix multiplication.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!