Mojo function

create_tensor_tile

create_tensor_tile[dtype: DType, rank: Int, //, tile_shape: IndexList[rank], /, k_major_tma: Bool = True, swizzle_mode: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_NONE, *, __tile_layout: Layout = Layout.row_major(tile_shape.__getitem__[Int](0), tile_shape.__getitem__[Int](1)), __desc_layout: Layout = _tma_desc_tile_layout[dtype, rank, tile_shape, swizzle_mode]()](ctx: DeviceContext, tensor: LayoutTensor[dtype, tensor.layout, tensor.origin, address_space=tensor.address_space, element_layout=tensor.element_layout, layout_int_type=tensor.layout_int_type, linear_idx_type=tensor.linear_idx_type, masked=tensor.masked, alignment=tensor.alignment]) -> TMATensorTile[dtype, __tile_layout, __desc_layout, k_major_tma]

Creates a TMATensorTile with advanced configuration options for 2D, 3D, 4D, or 5D tensors.

This overload provides more control over the TMA descriptor creation, allowing specification of data type, rank, and layout orientation. It supports 2D, 3D, 4D, and 5D tensors and provides fine-grained control over the memory access patterns.

Constraints:

Only supports 2D, 3D, 4D, and 5D tensors (rank must be 2, 3, 4, or 5).
For non-SWIZZLE_NONE modes, the K dimension size in bytes must be a multiple of the swizzle mode's byte size.
For MN-major layout, only SWIZZLE_128B is supported.
For 3D, 4D, and 5D tensors, only K-major layout is supported.

Parameters:

dtype (DType): DType The data type of the tensor elements.
rank (Int): Int The dimensionality of the tensor (must be 2, 3, 4, or 5).
tile_shape (IndexList): IndexList[rank] The shape of the tile to be transferred.
k_major_tma (Bool): Bool = True Whether the tma should copy desc into shared memory following a column-major (if True) or row-major (if False) pattern.
swizzle_mode (TensorMapSwizzle): TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_NONE The swizzling mode to use for memory access optimization.
__tile_layout (Layout): Layout = Layout.row_major(tile_shape[0], tile_shape[1]) Internal parameter for the tile layout in shared memory.
__desc_layout (Layout): Layout = _tma_desc_tile_layout[...] Internal parameter for the descriptor layout, which may differ from the tile layout to accommodate hardware requirements.

Args:

ctx (DeviceContext): DeviceContext The CUDA device context used to create the TMA descriptor.
tensor (LayoutTensor): LayoutTensor[dtype, ...] The source tensor from which data will be transferred. This defines the global memory layout and must match the specified data type.

Returns:

TMATensorTile: A TMATensorTile configured with the specified parameters, ready for use in asynchronous data transfer operations.

create_tensor_tile[dtype: DType, rank: Int, //, tile_shape: IndexList[rank], /, k_major_tma: Bool = True, swizzle_mode: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_NONE, *, __tile_layout: Layout = Layout.row_major(tile_shape.__getitem__[Int](0), tile_shape.__getitem__[Int](1)), __desc_layout: Layout = _tma_desc_tile_layout[dtype, rank, tile_shape, swizzle_mode]()](ctx: DeviceContext, tensor: TileTensor[dtype, tensor.LayoutType, tensor.origin, address_space=tensor.address_space, linear_idx_type=tensor.linear_idx_type, element_shape_types=tensor.element_shape_types]) -> TMATensorTile[dtype, __tile_layout, __desc_layout, k_major_tma]

Creates a TMATensorTile from a TileTensor.

This overload accepts a TileTensor instead of LayoutTensor, enabling use with the new coordinate-based tensor abstraction.

Parameters:

dtype (DType): The data type of the tensor elements.
rank (Int): The dimensionality of the tensor (must be 2, 3, 4, or 5).
tile_shape (IndexList): The shape of the tile to be transferred.
k_major_tma (Bool): Whether the TMA should use column-major pattern.
swizzle_mode (TensorMapSwizzle): The swizzling mode for memory access optimization.
__tile_layout (Layout): Internal parameter for the tile layout.
__desc_layout (Layout): Internal parameter for the descriptor layout.

Args:

ctx (DeviceContext): The CUDA device context.
tensor (TileTensor): The source TileTensor.

Returns:

TMATensorTile: A TMATensorTile configured for the given tensor.