Mojo module
tensor_core_async
Tensor Core Async Module
This module provides high-performance abstractions for utilizing NVIDIA's Tensor Cores to perform asynchronous matrix multiplication operations. It implements optimized memory layouts and access patterns for efficient tensor core computations.
Key components:
- Layout creation functions for K-major and MN-major memory arrangements
- Swizzling support for improved memory access patterns
- WGMMA (Warp Group Matrix Multiply-Accumulate) descriptor generation
- TensorCoreAsync struct with methods for asynchronous matrix multiplication
The module supports various data types, matrix dimensions, and memory configurations, enabling efficient implementation of deep learning primitives and other tensor operations that can leverage hardware acceleration.
Performance features:
- Asynchronous execution model to overlap computation and memory access
- Support for different swizzling modes to optimize memory bandwidth
- Efficient register and shared memory utilization
- Support for multi-warp group execution
This implementation is specifically optimized for NVIDIA GPUs with Tensor Core support.
Aliases
-
WGMMA_K_BYTES = 32
:
Structs
-
TensorCoreAsync
: High-performance asynchronous tensor core operations for matrix multiplication.
Functions
-
select_k_atom
: Creates a core matrix layout for tensor core operations. -
st_matrix_n_atom
: Creates a layout for N-majorst_matrix
atom in the context of WGMMA C matrix. -
st_matrix_n_layout
: Creates a layout for N-majorst_matrix
in the context of WGMMA C matrix. -
tile_layout_k_major
: Creates a K-major layout for tensor core operations. -
tile_layout_mn_major
: Creates an MN-major layout for tensor core operations. -
tile_to_descriptor
: Transforms a layout into a WGMMA descriptor-compatible layout. -
wgmma_c_layout
: Generates three layouts for mapping WGMMA C matrix coordinates. -
wgmma_c_thread_layout
: Returns the thread layout component for WGMMA C matrix. -
wgmma_output_layout
: Returns the output layout component for WGMMA C matrix.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!