For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

QuantizationConfig

`QuantizationConfig`

class max.graph.quantization.QuantizationConfig(quant_method, bits, group_size, desc_act=False, sym=False)

source

Bases: object

Configuration for specifying quantization parameters that affect inference.

These parameters control how tensor values are quantized, including the method, bit precision, grouping, and other characteristics that affect the trade-off between model size, inference speed, and accuracy.

Parameters:

quant_method (str)
bits (int)
group_size (int)
desc_act (bool)
sym (bool)

`bits`

bits: int

source

The number of bits used to represent each quantized weight element.

`desc_act`

desc_act: bool = False

source

Whether to use activation ordering (descending activation order). Defaults to False.

`group_size`

group_size: int

source

The number of weight elements that share a single set of quantization parameters.

`quant_method`

quant_method: str

source

The quantization method name (for example, gptq or awq).

`sym`

sym: bool = False

source

Whether to use symmetric quantization. Defaults to False.

QuantizationConfig​

bits​

desc_act​

group_size​

quant_method​

sym​