Skip to main content

Mojo function

create_tma_descriptor

create_tma_descriptor[dtype: DType, rank: Int, swizzle_mode: TensorMapSwizzle = 0](global_buf: DeviceBuffer[dtype], global_shape: IndexList[rank], global_strides: IndexList[rank], shared_mem_shape: IndexList[rank]) -> TMADescriptor

Creates a TMA descriptor for tiled memory operations.

Encodes tensor layout information into a 128-byte TMA descriptor that can be used with TMA hardware instructions to efficiently copy data between global and shared memory on NVIDIA GPUs.

The descriptor specifies a mapping from a tile in shared memory to a region in global memory, including dimensions, strides, data type, and optional swizzling for bank conflict avoidance.

Parameters:

  • dtype (DType): The element data type of the tensor.
  • rank (Int): The number of dimensions (1-5).
  • swizzle_mode (TensorMapSwizzle): The swizzle pattern to apply in shared memory.

Args:

  • global_buf (DeviceBuffer): Device buffer containing the global memory tensor.
  • global_shape (IndexList): Dimensions of the tensor in global memory.
  • global_strides (IndexList): Strides (in elements) for each dimension in global memory. The tensor must be row-major (stride at innermost dimension equals 1).
  • shared_mem_shape (IndexList): Dimensions of the tile to be copied to shared memory.

Returns:

TMADescriptor: A TMA descriptor configured for the specified tensor layout.

Raises:

An error if the descriptor creation fails.

Was this page helpful?