Mojo module
mma
This module includes utilities for working with the warp-matrix-matrix-multiplication (wmma) instructions.
Structs
-
WGMMADescriptor: Descriptor for shared memory operands used in warp group matrix multiply operations.
Functions
-
get_amd_bf8_dtype: Gets the appropriate BF8 dtype for the current AMD GPU architecture. -
get_amd_fp8_dtype: Gets the appropriate FP8 dtype for the current AMD GPU architecture. -
ld_matrix: Loads a matrix from shared memory into registers in a format suitable for tensor core operations. -
mma: Performs warp sync Tensor Core based Matrix-multiply and accumulate (MMA) operation. -
st_matrix: Performs warp-synchronized copy from registers to shared memory. -
wgmma_async: Performs warp group async Matrix-multiply and accumulate (WGMMA) operation. -
wgmma_commit_group_sync: Commits pending warp group matrix multiply operations. -
wgmma_fence_aligned: Inserts a memory fence for warp group matrix multiply operations. -
wgmma_wait_group_sync: Waits for all pending warp group matrix multiply operations to complete.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!