IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

TileWriterTMA

struct TileWriterTMA[tma_origin: ImmutOrigin, dtype: DType, tma_rank: Int, tile_shape: IndexList[tma_rank], desc_shape: IndexList[tma_rank], //]

TMA-based tile writer for hardware-accelerated memory transfers.

This writer uses NVIDIA's Tensor Memory Accelerator (TMA) for efficient 2D tile transfers from shared to global memory.

Parameters​

  • ​tma_origin (ImmutOrigin): Origin type for the TMA operation.
  • ​dtype (DType): Data type of the elements being written.
  • ​tma_rank (Int): Rank of the TMA tile (number of dimensions).
  • ​tile_shape (IndexList[tma_rank]): Shape of the TMA tile for async store operations.
  • ​desc_shape (IndexList[tma_rank]): Shape described by the TMA descriptor.

Fields​

  • ​tma_op (TileWriterTMA.TMATensorTilePtr):

Implemented traits​

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDeletable, Movable, RegisterPassable, SMemTileWriter, TrivialRegisterPassable

comptime members​

TMATensorTilePtr​

comptime TMATensorTilePtr = Pointer[TMATensorTile[dtype, tma_rank, tile_shape, desc_shape], tma_origin]

Methods​

__init__​

def __init__(tma_op: Pointer[TMATensorTile[dtype, tma_rank, tile_shape, desc_shape], tma_origin]) -> Self

Initialize the TMA tile writer.

Args:

write_tile​

def write_tile(self, src: TileTensor[dtype, address_space=AddressSpace.SHARED, linear_idx_type=src.linear_idx_type, element_size=src.element_size], coords: Tuple[Int, Int])

Write a tile using TMA hardware acceleration.

Performs an asynchronous TMA store from shared memory to global memory. The operation includes proper fencing and synchronization.

Note: Coordinates are expected in (N, M) order for column-major output.

Args: