Mojo module
tile_writer
TileWriter module for efficient tile writing in GPU matrix multiplication.
This module provides utilities for writing tiles to memory using different mechanisms and destinations:
-
Register β Shared Memory: Uses st.matrix hardware instruction for efficient storage of WGMMA accumulator results to shared memory with swizzling.
-
Register β Global Memory: Direct stores from register tiles to global memory with optional epilogue processing and bounds checking.
-
Shared Memory β Global Memory: Hardware-accelerated TMA stores or regular stores for efficient 2D tile transfers from shared to global memory.
Two main traits abstract these writing mechanisms:
- TileWriter: For shared memory β global memory transfers
- RegTileWriter: For register β memory (shared or global) transfers
Structsβ
- β
FragmentToSMemWriter: Writes WGMMA accumulator results from registers to shared memory using st.matrix. - β
RegisterToGMemWriter: Writer for transferring accumulator registers directly to global memory. - β
ThreadInfo: Thread identification within the warp group. - β
TileCoordinates: Helper struct for managing tile coordinate offsets. - β
TileWriterThreadwise: - β
TileWriterTMA: TMA-based tile writer for hardware-accelerated memory transfers.
Traitsβ
- β
RegTileWriter: Base trait for tile writing mechanisms in matrix multiplication. - β
SMemTileWriter: Base trait for tile writing mechanisms in matrix multiplication.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!