Skip to main content

Python class

QuantizationConfig

QuantizationConfig

class max.graph.quantization.QuantizationConfig(quant_method, bits, group_size, desc_act=False, sym=False)

source

Bases: object

Configuration for specifying quantization parameters that affect inference.

These parameters control how tensor values are quantized, including the method, bit precision, grouping, and other characteristics that affect the trade-off between model size, inference speed, and accuracy.

Parameters:

bits

bits: int

source

The number of bits used to represent each quantized weight element.

desc_act

desc_act: bool = False

source

Whether to use activation ordering (descending activation order). Defaults to False.

group_size

group_size: int

source

The number of weight elements that share a single set of quantization parameters.

quant_method

quant_method: str

source

The quantization method name (for example, gptq or awq).

sym

sym: bool = False

source

Whether to use symmetric quantization. Defaults to False.