Skip to main content

Mojo module

block_scaled_mma

CDNA4 block-scaled MFMA wrappers for the LLVM f8f6f4 family.

These wrappers are shared architecture helpers for AMD block-scaled kernels. The MFMA inputs use LLVM's f8f6f4 operand-format selector so callers can pick FP8, FP6, or FP4 encodings per operand, while the scale inputs remain packed E8M0 words.

Structs​

Functions​

  • ​cdna4_block_scaled_mfma: Executes a CDNA4 f8f6f4 block-scaled MFMA, inferring the MMA shape from the accumulator width (16 lanes -> 32x32x64, 4 lanes -> 16x16x128).

Was this page helpful?