Mojo module
tma
NVIDIA Tensor Memory Accelerator (TMA) module.
Provides types and functions for working with NVIDIA's Tensor Memory Accelerator, which enables efficient asynchronous data movement between global and shared memory on GPUs with Hopper architecture and newer.
The TMA hardware provides hardware-accelerated multi-dimensional memory copies with features like swizzling for bank conflict avoidance, L2 cache promotion hints, and support for various data types and memory layouts.
Structs
-
TensorMapDataType: Data type enumeration for TMA tensor map descriptors. -
TensorMapFloatOOBFill: Out-of-bounds fill mode for floating-point TMA operations. -
TensorMapInterleave: Interleave mode for TMA tensor map descriptors. -
TensorMapL2Promotion: L2 cache promotion hint for TMA tensor map descriptors. -
TensorMapSwizzle: Swizzle mode for TMA tensor map descriptors. -
TMADescriptor: TMA tensor map descriptor.
Functions
-
create_tma_descriptor: Creates a TMA descriptor for tiled memory operations. -
prefetch_tma_descriptor: Prefetches a TMA descriptor into the constant cache.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!