Mojo struct
TileWriterTMA
struct TileWriterTMA[tma_origin: ImmutOrigin, dtype: DType, tma_rank: Int, tile_shape: IndexList[tma_rank], desc_shape: IndexList[tma_rank], //]
TMA-based tile writer for hardware-accelerated memory transfers.
This writer uses NVIDIA's Tensor Memory Accelerator (TMA) for efficient 2D tile transfers from shared to global memory.
Parameters
- tma_origin (
ImmutOrigin): Origin type for the TMA operation. - dtype (
DType): Data type of the elements being written. - tma_rank (
Int): Rank of the TMA tile (number of dimensions). - tile_shape (
IndexList): Shape of the TMA tile for async store operations. - desc_shape (
IndexList): Shape described by the TMA descriptor.
Fields
- tma_op (
TileWriterTMA.TMATensorTilePtr):
Implemented traits
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
RegisterPassable,
SMemTileWriter,
TrivialRegisterPassable
comptime members
TMATensorTilePtr
comptime TMATensorTilePtr = Pointer[TMATensorTile[dtype, tma_rank, tile_shape, desc_shape], tma_origin]
Methods
__init__
__init__(tma_op: Pointer[TMATensorTile[dtype, tma_rank, tile_shape, desc_shape], tma_origin]) -> Self
Initialize the TMA tile writer.
Args:
- tma_op (
Pointer): Pointer to the TMA tensor descriptor.
write_tile
write_tile(self, src: LayoutTensor[dtype, src.layout, MutAnyOrigin, address_space=AddressSpace.SHARED, element_layout=src.element_layout, layout_int_type=src.layout_int_type, linear_idx_type=src.linear_idx_type, masked=src.masked, alignment=128], coords: Tuple[Int, Int])
Write a tile using TMA hardware acceleration.
Performs an asynchronous TMA store from shared memory to global memory. The operation includes proper fencing and synchronization.
Note: Coordinates are expected in (N, M) order for column-major output.
Args:
- src (
LayoutTensor): Source tile in shared memory. - coords (
Tuple): Tile coordinates (col, row) in element space.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!