Mojo struct
TileWriterTMA
@register_passable(trivial)
struct TileWriterTMA[tma_origin: ImmutableOrigin, dtype: DType, tma_layout: Layout, desc_layout: Layout]
TMA-based tile writer for hardware-accelerated memory transfers.
This writer uses NVIDIA's Tensor Memory Accelerator (TMA) for efficient 2D tile transfers from shared to global memory.
Parameters
- tma_origin (
ImmutableOrigin
): Origin type for the TMA operation. - dtype (
DType
): Data type of the elements being written. - tma_layout (
Layout
): Layout of the TMA tile for async store operations. - desc_layout (
Layout
): Layout described by the TMA descriptor.
Fields
- tma_op (
Pointer[TMATensorTile[dtype, tma_layout, desc_layout], tma_origin]
):
Implemented traits
AnyType
,
Copyable
,
ImplicitlyCopyable
,
Movable
,
TileWriter
,
UnknownDestructibility
Aliases
__copyinit__is_trivial
alias __copyinit__is_trivial = True
__del__is_trivial
alias __del__is_trivial = True
__moveinit__is_trivial
alias __moveinit__is_trivial = True
TMATensorTilePtr
alias TMATensorTilePtr = Pointer[TMATensorTile[dtype, tma_layout, desc_layout], tma_origin]
Methods
__init__
__init__(tma_op: Pointer[TMATensorTile[dtype, tma_layout, desc_layout], tma_origin]) -> Self
Initialize the TMA tile writer.
Args:
- tma_op (
Pointer
): Pointer to the TMA tensor descriptor.
write_tile
write_tile(self, src: LayoutTensor[dtype, layout, MutableAnyOrigin, address_space=AddressSpace(3), alignment=128], coords: Tuple[UInt, UInt])
Write a tile using TMA hardware acceleration.
Performs an asynchronous TMA store from shared memory to global memory. The operation includes proper fencing and synchronization.
Note: Coordinates are expected in (N, M) order for column-major output.
Args:
- src (
LayoutTensor
): Source tile in shared memory. - coords (
Tuple
): Tile coordinates (col, row) in element space.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!