IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

block_scaled_mma

CDNA4 block-scaled MFMA wrappers for the LLVM f8f6f4 family.

These wrappers are shared architecture helpers for AMD block-scaled kernels. The MFMA inputs use LLVM's f8f6f4 operand-format selector so callers can pick FP8, FP6, or FP4 encodings per operand, while the scale inputs remain packed E8M0 words.

Structs​

Functions​

  • ​cdna4_block_scaled_mfma: Executes a CDNA4 f8f6f4 block-scaled MFMA, inferring the MMA shape from the accumulator width (16 lanes -> 32x32x64, 4 lanes -> 16x16x128).