Skip to main content

Mojo struct

TileWriterTMA

@register_passable(trivial) struct TileWriterTMA[tma_origin: ImmutableOrigin, dtype: DType, tma_layout: Layout, desc_layout: Layout]

TMA-based tile writer for hardware-accelerated memory transfers.

This writer uses NVIDIA's Tensor Memory Accelerator (TMA) for efficient 2D tile transfers from shared to global memory.

Parameters

  • tma_origin (ImmutableOrigin): Origin type for the TMA operation.
  • dtype (DType): Data type of the elements being written.
  • tma_layout (Layout): Layout of the TMA tile for async store operations.
  • desc_layout (Layout): Layout described by the TMA descriptor.

Fields

  • tma_op (Pointer[TMATensorTile[dtype, tma_layout, desc_layout], tma_origin]):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, Movable, TileWriter, UnknownDestructibility

Aliases

__copyinit__is_trivial

alias __copyinit__is_trivial = True

__del__is_trivial

alias __del__is_trivial = True

__moveinit__is_trivial

alias __moveinit__is_trivial = True

TMATensorTilePtr

alias TMATensorTilePtr = Pointer[TMATensorTile[dtype, tma_layout, desc_layout], tma_origin]

Methods

__init__

__init__(tma_op: Pointer[TMATensorTile[dtype, tma_layout, desc_layout], tma_origin]) -> Self

Initialize the TMA tile writer.

Args:

  • tma_op (Pointer): Pointer to the TMA tensor descriptor.

write_tile

write_tile(self, src: LayoutTensor[dtype, layout, MutableAnyOrigin, address_space=AddressSpace(3), alignment=128], coords: Tuple[UInt, UInt])

Write a tile using TMA hardware acceleration.

Performs an asynchronous TMA store from shared memory to global memory. The operation includes proper fencing and synchronization.

Note: Coordinates are expected in (N, M) order for column-major output.

Args:

  • src (LayoutTensor): Source tile in shared memory.
  • coords (Tuple): Tile coordinates (col, row) in element space.

Was this page helpful?