Skip to main content

Mojo package

compute

GPU compute operations package - MMA and tensor core operations.

This package provides GPU tensor core and matrix multiplication operations:

  • mma: Unified warp matrix-multiply-accumulate (WMMA) operations
  • mma_util: Utility functions for loading/storing MMA operands
  • mma_operand_descriptor: Operand descriptor types for MMA
  • tensor_ops: Tensor core-based reductions and operations
  • tcgen05: 5th generation tensor core operations (Blackwell)
  • arch/: Architecture-specific MMA implementations (internal)
    • mma_nvidia: NVIDIA tensor cores (SM70-SM90)
    • mma_nvidia_sm100: NVIDIA Blackwell (SM100)
    • mma_amd: AMD Matrix Cores (CDNA2/3/4)
    • mma_amd_rdna: AMD WMMA (RDNA3/4)

Usage

Import compute operations directly:

from gpu.compute import mma

# Automatically dispatches to the correct GPU architecture
result = mma.mma(a, b, c)

Architecture-specific implementations in arch/ are internal and should not be imported directly by user code.

Packages

  • arch: Architecture-specific MMA implementations.

Modules

  • mma: This module includes utilities for working with the warp-matrix-matrix-multiplication (wmma) instructions.
  • mma_operand_descriptor:
  • mma_util: Matrix multiply accumulate (MMA) utilities for GPU tensor cores.
  • tcgen05: This module includes utilities for working with the tensorcore 5th generation (tcgen05) instructions.
  • tensor_ops: This module provides tensor core operations and utilities for GPU computation.

Was this page helpful?