Mojo module
block_scaled_mma
CDNA4 block-scaled MFMA wrappers for the LLVM f8f6f4 family.
These wrappers are shared architecture helpers for AMD block-scaled kernels.
The MFMA inputs use LLVM's f8f6f4 operand-format selector so callers can pick
FP8, FP6, or FP4 encodings per operand, while the scale inputs remain packed
E8M0 words.
Structsβ
- β
CDNA4F8F6F4MatrixFormat: Represents the CDNA4f8f6f4operand format selector.
Functionsβ
- β
cdna4_block_scaled_mfma: Executes a CDNA4f8f6f4block-scaled MFMA, inferring the MMA shape from the accumulator width (16 lanes -> 32x32x64, 4 lanes -> 16x16x128).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!