Skip to main content

Mojo struct

TileWriterTMA

struct TileWriterTMA[tma_origin: ImmutOrigin, dtype: DType, tma_rank: Int, tile_shape: IndexList[tma_rank], desc_shape: IndexList[tma_rank], //]

TMA-based tile writer for hardware-accelerated memory transfers.

This writer uses NVIDIA's Tensor Memory Accelerator (TMA) for efficient 2D tile transfers from shared to global memory.

Parameters​

  • ​tma_origin (ImmutOrigin): Origin type for the TMA operation.
  • ​dtype (DType): Data type of the elements being written.
  • ​tma_rank (Int): Rank of the TMA tile (number of dimensions).
  • ​tile_shape (IndexList[tma_rank]): Shape of the TMA tile for async store operations.
  • ​desc_shape (IndexList[tma_rank]): Shape described by the TMA descriptor.

Fields​

  • ​tma_op (TileWriterTMA.TMATensorTilePtr):

Implemented traits​

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, RegisterPassable, SMemTileWriter, TrivialRegisterPassable

comptime members​

TMATensorTilePtr​

comptime TMATensorTilePtr = Pointer[TMATensorTile[dtype, tma_rank, tile_shape, desc_shape], tma_origin]

Methods​

__init__​

__init__(tma_op: Pointer[TMATensorTile[dtype, tma_rank, tile_shape, desc_shape], tma_origin]) -> Self

Initialize the TMA tile writer.

Args:

write_tile​

write_tile(self, src: LayoutTensor[dtype, MutAnyOrigin, address_space=AddressSpace.SHARED, element_layout=src.element_layout, layout_int_type=src.layout_int_type, linear_idx_type=src.linear_idx_type, masked=src.masked, alignment=128], coords: Tuple[Int, Int])

Write a tile using TMA hardware acceleration.

Performs an asynchronous TMA store from shared memory to global memory. The operation includes proper fencing and synchronization.

Note: Coordinates are expected in (N, M) order for column-major output.

Args: