Python class
QuantizationConfig
QuantizationConfig
class max.graph.quantization.QuantizationConfig(quant_method, bits, group_size, desc_act=False, sym=False)
Bases: object
Configuration for specifying quantization parameters that affect inference.
These parameters control how tensor values are quantized, including the method, bit precision, grouping, and other characteristics that affect the trade-off between model size, inference speed, and accuracy.
bits
bits: int
The number of bits used to represent each quantized weight element.
desc_act
desc_act: bool = False
Whether to use activation ordering (descending activation order). Defaults to False.
group_size
group_size: int
The number of weight elements that share a single set of quantization parameters.
quant_method
quant_method: str
The quantization method name (for example, gptq or awq).
sym
sym: bool = False
Whether to use symmetric quantization. Defaults to False.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!