Skip to main content

Mojo function

create_nested_tma_tile

create_nested_tma_tile[dtype: DType, //, tile_m: Int, tile_n: Int, swizzle_mode: TensorMapSwizzle, *, is_k_major: Bool](ctx: DeviceContext, tensor: LayoutTensor[dtype, layout, origin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], out res: TMATensorTile[dtype, tile_layout_k_major[::DType,::Int,::Int,::TensorMapSwizzle]() if is_k_major else tile_layout_mn_major[::DType,::Int,::Int,::TensorMapSwizzle](), _tma_desc_tile_layout[::DType,::Int,::IndexList[$1, ::DType(), is_k_major])

Creates a rank 2 TMATensorTile with a nested layout using tile_layout_k_major is is_k_major or tile_layout_mn_major otherwise.

Parameters:

  • dtype (DType): DType The data type of the tensor elements.
  • tile_m (Int): The number of rows of a global memory tile.
  • tile_n (Int): The number of columns of a global memory tile.
  • swizzle_mode (TensorMapSwizzle): The swizzle_mode used by the TMA operation.
  • is_k_major (Bool): Whether the shared memory is to be k-major or mn-major. If mn-major, it is transposed.

Args:

  • ctx (DeviceContext): DeviceContext The CUDA device context used to create the TMA descriptor.
  • tensor (LayoutTensor): LayoutTensor[type, *, **] The source tensor from which data will be transferred. This defines the global memory layout and must match the specified data type.

Returns:

TMATensorTile: The TMATensorTile configured with the specified tile dimensions and swizzle mode, ready for use in asynchronous data transfer operations.

Was this page helpful?