Skip to main content

Python class

GPTQLinear

GPTQLinear​

class max.nn.GPTQLinear(in_dim, out_dim, dtype, device, has_bias=False, quantization_encoding=None, quantization_config=None, quant_config=None)

source

Bases: Linear

A Linear layer for GPTQ encoding.

Initializes the linear layer with weights and optional bias with GPTQ quantization.

Initializes the layer for GPTQ quantized linear transformations.

Parameters:

  • in_dim (int) – The dimensionality of the input space.
  • out_dim (int) – The dimensionality of the output space.
  • dtype (DType) – The DType for both weights and bias.
  • device (DeviceRef) – The target DeviceRef for computation. Weights remain on CPU until moved during computation.
  • has_bias (bool) – When True, adds a bias vector to the layer. Defaults to False.
  • quantization_encoding (QuantizationEncoding | None) – The QuantizationEncoding of the weights.
  • quantization_config (QuantizationConfig | None) – Extra QuantizationConfig for the weight quantization.
  • quant_config (QuantConfig | None) – QuantConfig for scaled quantization (not supported).