Mojo module
mma
This module includes utilities for working with the warp-matrix-matrix-multiplication (wmma) instructions.
Structs
-
WGMMADescriptor
: Descriptor for shared memory operands used in warp group matrix multiply operations.
Functions
-
ld_matrix
: Loads a matrix from shared memory into registers in a format suitable for tensor core operations. -
mma
: Performs warp sync Tensor Core based Matrix-multiply and accumulate (MMA) operation. -
st_matrix
: Performs warp-synchronized copy from registers to shared memory. -
wgmma_async
: Performs warp group async Matrix-multiply and accumulate (WGMMA) operation. -
wgmma_commit_group_sync
: Commits pending warp group matrix multiply operations. -
wgmma_fence_aligned
: Inserts a memory fence for warp group matrix multiply operations. -
wgmma_wait_group_sync
: Waits for all pending warp group matrix multiply operations to complete.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!