Skip to main content



The Q4_0 quantization encoding.

Q4_0 is a block quantization scheme originally designed for GGML in which each element (number) is reduced to an unsigned, fixed-point, 4-bit value. Multiple quantized elements are packed together in a block, all using the same float16 scale.

The packing scheme requires that the innermost dimension is a factor of 32. When the tensor is quantized to Q4_0, each block of 32 scalar values is packed into 18 bytes. The first two bytes specify the float16 quantization scale, and the other 16 bytes hold the 32 values (one byte holds two 4-bit values).

Implemented traits

AnyType, QuantizationEncoding



static quantize(tensor: Tensor[float32]) -> Tensor[uint8]

Quantizes the full-precision tensor to Q4_0.


  • tensor (Tensor[float32]): Full-precision tensor to quantize. The innermost dimension


Quantized Q4_0 tensor. The tensor datatype is uint8 because this is simply a bytes buffer. Each scalar is actually stored with 4 bits.


If the last dimension size is not a factor of 32.


static id() -> String

Identifier for the Q4_0 quantized encoding.