Mojo struct

TileWriterTMA

@register_passable(trivial) struct TileWriterTMA[_mlir_origin: LITImmutOrigin, tma_origin: ImmutOrigin, dtype: DType, tma_layout: Layout, desc_layout: Layout, //]

TMA-based tile writer for hardware-accelerated memory transfers.

This writer uses NVIDIA's Tensor Memory Accelerator (TMA) for efficient 2D tile transfers from shared to global memory.

Parameters

tma_origin (ImmutOrigin): Origin type for the TMA operation.
dtype (DType): Data type of the elements being written.
tma_layout (Layout): Layout of the TMA tile for async store operations.
desc_layout (Layout): Layout described by the TMA descriptor.

Fields

tma_op (TileWriterTMA.TMATensorTilePtr):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, RegisterPassable, SMemTileWriter, TrivialRegisterPassable

`comptime` members

`__copy_ctor_is_trivial`

comptime __copy_ctor_is_trivial = True

`delis_trivial`

comptime __del__is_trivial = True

`__move_ctor_is_trivial`

comptime __move_ctor_is_trivial = True

`TMATensorTilePtr`

comptime TMATensorTilePtr = Pointer[TMATensorTile[dtype, tma_layout, desc_layout], tma_origin]

Methods

`init`

__init__(tma_op: Pointer[TMATensorTile[dtype, tma_layout, desc_layout], tma_origin]) -> Self

Initialize the TMA tile writer.

Args:

tma_op (Pointer): Pointer to the TMA tensor descriptor.

`write_tile`

write_tile(self, src: LayoutTensor[dtype, src.layout, MutAnyOrigin, address_space=AddressSpace.SHARED, element_layout=src.element_layout, layout_int_type=src.layout_int_type, linear_idx_type=src.linear_idx_type, masked=src.masked, alignment=128], coords: Tuple[UInt, UInt])

Write a tile using TMA hardware acceleration.

Performs an asynchronous TMA store from shared memory to global memory. The operation includes proper fencing and synchronization.

Note: Coordinates are expected in (N, M) order for column-major output.

Args:

src (LayoutTensor): Source tile in shared memory.
coords (Tuple): Tile coordinates (col, row) in element space.

Parameters​

Fields​

Implemented traits​

comptime members​

__copy_ctor_is_trivial​

__del__is_trivial​

__move_ctor_is_trivial​

TMATensorTilePtr​

Methods​

__init__​

write_tile​