Mojo module

mma

This module includes utilities for working with the warp-matrix-matrix-multiplication (wmma) instructions.

Structs

WGMMADescriptor: Descriptor for shared memory operands used in warp group matrix multiply operations.

ld_matrix: Loads a matrix from shared memory into registers in a format suitable for tensor core operations.
mma: Performs warp sync Tensor Core based Matrix-multiply and accumulate (MMA) operation.
st_matrix: Performs warp-synchronized copy from registers to shared memory.
wgmma_async: Performs warp group async Matrix-multiply and accumulate (WGMMA) operation.
wgmma_commit_group_sync: Commits pending warp group matrix multiply operations.
wgmma_fence_aligned: Inserts a memory fence for warp group matrix multiply operations.
wgmma_wait_group_sync: Waits for all pending warp group matrix multiply operations to complete.