Mojo module
tile_writer
TileWriter module for efficient tile writing in GPU matrix multiplication.
This module provides utilities for writing matrix tiles from shared memory to global memory using two different mechanisms:
-
TMA (Tensor Memory Accelerator): Hardware-accelerated stores for efficient 2D tile transfers from shared to global memory.
-
Regular stores: Software-based synchronous stores with manual thread distribution and swizzling for optimal memory access patterns.
The TileWriter trait abstracts these writing mechanisms to provide a unified interface for the matmul kernel's consumer threads.
Structs
-
FragmentToSMemWriter
: Writer for storing accumulator fragments from registers to shared memory. -
MMATileCoords
: Coordinates for an MMA tile in the output. -
RegisterToGMemWriter
: Writer for transferring accumulator registers directly to global memory. -
ThreadInfo
: Thread identification within the warp group. -
TileCoordinates
: Helper struct for managing tile coordinate offsets. -
TileWriterRegular
: -
TileWriterTMA
: TMA-based tile writer for hardware-accelerated memory transfers.
Traits
-
TileWriter
: Base trait for tile writing mechanisms in matrix multiplication.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!