Mojo module
tma_async
Tensor Memory Accelerator (TMA) Asynchronous Operations Module
Provides high-performance abstractions for NVIDIA's Tensor Memory Accelerator (TMA), enabling efficient asynchronous data movement between global and shared memory in GPU kernels. It is designed for use with NVIDIA Hopper architecture and newer GPUs that support TMA instructions.
Key Components:
-
TMATensorTile
: Core struct that encapsulates a TMA descriptor for efficient data transfers between global and shared memory with various access patterns and optimizations. -
SharedMemBarrier
: Synchronization primitive for coordinating asynchronous TMA operations, ensuring data transfers complete before dependent operations begin. -
PipelineState
: Helper struct for managing multi-stage pipeline execution with circular buffer semantics, enabling efficient double or triple buffering techniques. -
create_tma_tile
: Factory functions for creating optimizedTMATensorTile
instances with various configurations for different tensor shapes and memory access patterns.
Aliases
TMANestedTensorTile
alias TMANestedTensorTile[dtype: DType, tile_m: Int, tile_n: Int, swizzle_mode: TensorMapSwizzle, is_k_major: Bool] = TMATensorTile[dtype, tile_layout_k_major[::DType,::Int,::Int,::TensorMapSwizzle]() if is_k_major else tile_layout_mn_major[::DType,::Int,::Int,::TensorMapSwizzle](), _tma_desc_tile_layout[::DType,::Int,::IndexList[$1, ::DType(), is_k_major]
Parameters
- dtype (
DType
): - tile_m (
Int
): - tile_n (
Int
): - swizzle_mode (
TensorMapSwizzle
): - is_k_major (
Bool
):
Structs
-
PipelineState
: Manages state for a multi-stage pipeline with circular buffer semantics. -
SharedMemBarrier
: A hardware-accelerated synchronization primitive for GPU shared memory operations. -
TMATensorTile
: A hardware-accelerated tensor memory access (TMA) tile for efficient asynchronous data movement. -
TMATensorTileArray
: An array of TMA descripotr.
Functions
-
create_nested_tma_tile
: Creates a rank 2TMATensorTile
with a nested layout usingtile_layout_k_major
isis_k_major
ortile_layout_mn_major
otherwise. -
create_tma_tile
: Creates aTMATensorTile
with specified tile dimensions and swizzle mode. -
create_tma_tile_template
: Same as create_tma_tile expect the descriptor is only a placeholder or a template for later replacement.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!