Mojo package
arch
Architecture-specific MMA implementations.
This package contains GPU architecture-specific implementations of matrix multiply-accumulate (MMA) operations:
- mma_nvidia: NVIDIA tensor cores (SM70-SM90) - Volta through Hopper
- mma_nvidia_sm100: NVIDIA Blackwell (SM100) tensor cores - 5th gen tensor cores
- mma_amd: AMD Matrix Cores (CDNA2/3/4) - Data center GPUs
- mma_amd_rdna: AMD WMMA (RDNA3/4) - Consumer GPUs
Module Organization
Each architecture module contains:
- Private implementation functions (prefixed with
_) - Architecture-specific intrinsic calls
- Data type conversions specific to that architecture
Usage
These modules should not be imported directly by user code. Instead, use the
unified interface in gpu.compute.mma which automatically dispatches to the
appropriate architecture-specific implementation at compile time:
from gpu.compute import mma
# Automatically dispatches to the correct architecture
result = mma(a, b, c)Internal Implementation Details
The main gpu.compute.mma module imports these implementations:
from .arch.mma_nvidia import _mma_nvidia
from .arch.mma_amd import _mma_amdAnd dispatches based on compile-time architecture detection:
@parameter
if is_nvidia_gpu():
_mma_nvidia(d, a, b, c)
elif is_amd_gpu():
_mma_amd[block_size](d, a, b, c)Modules
-
mma_amd: AMD CDNA Matrix Cores implementation for matrix multiply-accumulate operations. -
mma_amd_rdna: AMD RDNA3/4 WMMA implementation for matrix multiply-accumulate operations. -
mma_nvidia: NVIDIA Tensor Cores implementation for matrix multiply-accumulate operations. -
mma_nvidia_sm100: This module includes utilities for working with the SM100 MMA instructions.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!