IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

QuantFormat

QuantFormat​

class max.nn.QuantFormat(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

source

Bases: Enum

Identifies the quantization format of a model checkpoint.

BLOCKSCALED_FP8​

BLOCKSCALED_FP8 = 'blockscaled-fp8'

source

FP8 quantization with block-level scaling.

COMPRESSED_TENSORS_FP8​

COMPRESSED_TENSORS_FP8 = 'compressed-tensors-fp8'

source

FP8 quantization using the compressed-tensors format.

FBGEMM_FP8​

FBGEMM_FP8 = 'fbgemm-fp8'

source

FP8 quantization using the FBGEMM format.

MXFP4​

MXFP4 = 'mxfp4'

source

Microscaling FP4 (MX) quantization format.

MXFP8​

MXFP8 = 'mxfp8'

source

float8_e4m3fn data with E8M0 block scales at a 32-element K granularity. Uses the SM100 block-scaled tensor-core MMA (KIND_MXF8F6F4) rather than the 128-granularity blockwise-FP8 path.

Type:

Microscaling FP8 (MX) quantization

NVFP4​

NVFP4 = 'nvfp4'

source

NVIDIA FP4 quantization format.