For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

QuantFormat

`QuantFormat`

class max.nn.QuantFormat(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

source

Bases: Enum

Identifies the quantization format of a model checkpoint.

`BLOCKSCALED_FP8`

BLOCKSCALED_FP8 = 'blockscaled-fp8'

source

FP8 quantization with block-level scaling.

`COMPRESSED_TENSORS_FP8`

COMPRESSED_TENSORS_FP8 = 'compressed-tensors-fp8'

source

FP8 quantization using the compressed-tensors format.

`FBGEMM_FP8`

FBGEMM_FP8 = 'fbgemm-fp8'

source

FP8 quantization using the FBGEMM format.

`INT8_W8A8`

INT8_W8A8 = 'int8-w8a8'

source

per-output-channel (rowwise) int8 weight scales and per-token (dynamic rowwise) int8 activation scales, both symmetric absmax/127. Weights are RTN-quantized at load (no pre-quantized checkpoint). Apple M5 only: routes to the int8 widening-MMA GEMM (int8_matmul.mojo).

Type:: Symmetric int8 W8A8

`MXFP4`

MXFP4 = 'mxfp4'

source

Microscaling FP4 (MX) quantization format.

`MXFP8`

MXFP8 = 'mxfp8'

source

float8_e4m3fn data with E8M0 block scales at a 32-element K granularity. Uses the SM100 block-scaled tensor-core MMA (KIND_MXF8F6F4) rather than the 128-granularity blockwise-FP8 path.

Type:: Microscaling FP8 (MX) quantization

`NVFP4`

NVFP4 = 'nvfp4'

source

NVIDIA FP4 quantization format.

QuantFormat​

BLOCKSCALED_FP8​

COMPRESSED_TENSORS_FP8​

FBGEMM_FP8​

INT8_W8A8​

MXFP4​

MXFP8​

NVFP4​