Mojo function
create_nested_tma_tile
create_nested_tma_tile[dtype: DType, //, tile_m: Int, tile_n: Int, swizzle_mode: TensorMapSwizzle, *, is_k_major: Bool](ctx: DeviceContext, tensor: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], out res: TMATensorTile[dtype, tile_layout_k_major[::DType,::Int,::Int,::TensorMapSwizzle]() if is_k_major else tile_layout_mn_major[::DType,::Int,::Int,::TensorMapSwizzle](), _tma_desc_tile_layout[::DType,::Int,::IndexList[$1, ::DType(), is_k_major])
Creates a rank 2 TMATensorTile
with a nested layout using tile_layout_k_major
is is_k_major
or tile_layout_mn_major
otherwise.
Parameters:
- dtype (
DType
): DType The data type of the tensor elements. - tile_m (
Int
): The number of rows of a global memory tile. - tile_n (
Int
): The number of columns of a global memory tile. - swizzle_mode (
TensorMapSwizzle
): The swizzle_mode used by the TMA operation. - is_k_major (
Bool
): Whether the shared memory is to be k-major or mn-major. If mn-major, it is transposed.
Args:
- ctx (
DeviceContext
): DeviceContext The CUDA device context used to create the TMA descriptor. - tensor (
LayoutTensor
): LayoutTensor[type, *, **] The source tensor from which data will be transferred. This defines the global memory layout and must match the specified data type.
Returns:
TMATensorTile
: The TMATensorTile
configured with the specified tile dimensions and
swizzle mode, ready for use in asynchronous data transfer operations.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!