Mojo function
create_tma_descriptor
create_tma_descriptor[dtype: DType, rank: Int, swizzle_mode: TensorMapSwizzle = 0](global_buf: DeviceBuffer[dtype], global_shape: IndexList[rank], global_strides: IndexList[rank], shared_mem_shape: IndexList[rank]) -> TMADescriptor
Creates a TMA descriptor for tiled memory operations.
Encodes tensor layout information into a 128-byte TMA descriptor that can be used with TMA hardware instructions to efficiently copy data between global and shared memory on NVIDIA GPUs.
The descriptor specifies a mapping from a tile in shared memory to a region in global memory, including dimensions, strides, data type, and optional swizzling for bank conflict avoidance.
Parameters:
- dtype (
DType): The element data type of the tensor. - rank (
Int): The number of dimensions (1-5). - swizzle_mode (
TensorMapSwizzle): The swizzle pattern to apply in shared memory.
Args:
- global_buf (
DeviceBuffer): Device buffer containing the global memory tensor. - global_shape (
IndexList): Dimensions of the tensor in global memory. - global_strides (
IndexList): Strides (in elements) for each dimension in global memory. The tensor must be row-major (stride at innermost dimension equals 1). - shared_mem_shape (
IndexList): Dimensions of the tile to be copied to shared memory.
Returns:
TMADescriptor: A TMA descriptor configured for the specified tensor layout.
Raises:
An error if the descriptor creation fails.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!