Python module
quantization
APIs to quantize graph tensors.
This package includes a generic quantization encoding interface and some quantization encodings that conform to it, such as bfloat16 and Q4_0 encodings.
The main interface for defining a new quantized type is QuantizationEncoding.quantize(). This takes a full-precision tensor represented as float32 and quantizes it according to the encoding. The resulting quantized tensor is represented as a bytes tensor. For that reason, the QuantizationEncoding must know how to translate between the tensor shape and its corresponding quantized buffer shape.
Quantization support for MAX Graph.
BlockParameters
β
class max.graph.quantization.BlockParameters(elements_per_block: int, block_size: int)
block_size
β
block_size*: int*
elements_per_block
β
elements_per_block*: int*
QuantizationEncoding
β
class max.graph.quantization.QuantizationEncoding(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Quantization encodings supported by MAX Graph.
Q4_0
β
Q4_0 = 'Q4_0'
Q4_K
β
Q4_K = 'Q4_K'
Q5_K
β
Q5_K = 'Q5_K'
Q6_K
β
Q6_K = 'Q6_K'
block_parameters
β
property block_parameters*: BlockParameters*
block_size
β
property block_size*: int*
Number of bytes in encoded representation of block.
All quantization types currently supported by MAX Graph are block-based: groups of a fixed number of elements are formed, and each group is quantized together into a fixed-size output block. This value is the number of bytes resulting after encoding a single block.
elements_per_block
β
property elements_per_block*: int*
Number of elements per block.
All quantization types currently supported by MAX Graph are block-based: groups of a fixed number of elements are formed, and each group is quantized together into a fixed-size output block. This value is the number of elements gathered into a block.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!
If you'd like to share more information, please report an issue on GitHub
π What went wrong?