Skip to main content
Log in

Mojo module

mma

This module includes utilities for working with the warp-matrix-matrix-multiplication (wmma) instructions.

Structs

  • WGMMADescriptor: Descriptor for shared memory operands used in warp group matrix multiply operations.

Functions

  • ld_matrix: Loads a matrix from shared memory into registers in a format suitable for tensor core operations.
  • mma: Performs warp sync Tensor Core based Matrix-multiply and accumulate (MMA) operation.
  • st_matrix: Performs warp-synchronized copy from registers to shared memory.
  • wgmma_async: Performs warp group async Matrix-multiply and accumulate (WGMMA) operation.
  • wgmma_commit_group_sync: Commits pending warp group matrix multiply operations.
  • wgmma_fence_aligned: Inserts a memory fence for warp group matrix multiply operations.
  • wgmma_wait_group_sync: Waits for all pending warp group matrix multiply operations to complete.