IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

mxfp4_dequant_matmul_amd

MXFP4 matmul on AMD CDNA GPUs via dequant-to-FP8 + FP8 GEMM.

Dequantizes MXFP4 weights to FP8, casts BF16 activations to FP8, then dispatches to the AMD FP8 GEMM via _matmul_gpu.

MI355X (CDNA4) uses float8_e4m3fn; MI300X (CDNA3) uses float8_e4m3fnuz. The FP8 type is selected at compile time based on the target architecture.

Functions