Python module

# quantization

APIs to quantize graph tensors.

This package includes a generic quantization encoding interface and some quantization encodings that conform to it, such as bfloat16 and Q4_0 encodings.

The main interface for defining a new quantized type is QuantizationEncoding.quantize(). This takes a full-precision tensor represented as float32 and quantizes it according to the encoding. The resulting quantized tensor is represented as a bytes tensor. For that reason, the QuantizationEncoding must know how to translate between the tensor shape and its corresponding quantized buffer shape.

Quantization support for MAX Graph.

`BlockParameters`

classmax.graph.quantization.BlockParameters(elements_per_block: int, block_size: int)

`block_size`

block_size*: int*

`elements_per_block`

elements_per_block*: int*

`QuantizationEncoding`

classmax.graph.quantization.QuantizationEncoding(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Quantization encodings supported by MAX Graph.

`Q4_0`

Q4_0

= 'Q4_0'

`Q4_K`

Q4_K

= 'Q4_K'

`Q5_K`

Q5_K

= 'Q5_K'

`Q6_K`

Q6_K

= 'Q6_K'

`block_parameters`

propertyblock_parameters*: BlockParameters*

`block_size`

propertyblock_size*: int*

Number of bytes in encoded representation of block.

All quantization types currently supported by MAX Graph are block-based: groups of a fixed number of elements are formed, and each group is quantized together into a fixed-size output block. This value is the number of bytes resulting after encoding a single block.

`elements_per_block`

propertyelements_per_block*: int*

Number of elements per block.

All quantization types currently supported by MAX Graph are block-based: groups of a fixed number of elements are formed, and each group is quantized together into a fixed-size output block. This value is the number of elements gathered into a block.

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!

😔 What went wrong?