Skip to main content

Mojo struct

Q4sym

struct Q4sym[group_size: Int, float_dtype: DType = DType.float32]

Q4sym: compresses values of type float_dtype to 4bit unsigned integers which have been dynamically symmetrically quantized with the given scale factor.

group_size determines the number of elements which share quantization parameters.

We store things in a strided fashion: Example:

Assume group_size = 8 and we want to process uint4 numbers: A, B, C, D, E, F, G, H which have associated bits aaaa, bbbb, cccc, ....

eeeeaaaa|ffffbbbb|ggggcccc|hhhhdddd

To uncompress to floating point, take the decoded uint4 value, subtract the implicit zero-point of 2^4=8, and multiply by the scale factor.

Parameters​

  • ​group_size (Int): The number of encoded numbers stored in this struct.
  • ​float_dtype (DType): The floating point dtype this struct works with.

Fields​

  • ​scale (StaticTuple[UInt8, 2]): The FP16 scale of the group, stored as individual bytes.
  • ​bits (StaticTuple[UInt8, (group_size // 2)]): The bits of the encoded uint4 numbers.

Implemented traits​

AnyType, Defaultable, ImplicitlyDestructible

Methods​

__init__​

__init__(out self)

Construct a default initialized Q4sym.

__init__(out self, data: SIMD[float_dtype, group_size])

Construct an encoded Q4sym from data.

Args:

decode_scale​

decode_scale(mut self) -> Float16

Obtain the scale factor.

Returns:

Float16: The decoded scale factor.

decode_unsigned​

decode_unsigned(mut self) -> SIMD[DType.uint8, group_size]

Decode the stored uint4 numbers to uint8.

Returns:

SIMD[DType.uint8, group_size]: The decoded stored numbers as uint8 numbers. These have an implicit zero-point of 8.

decode_signed​

decode_signed(mut self) -> SIMD[DType.int8, group_size]

Decode the stored uint4 numbers to requantized int4 numbers.

This is done by simply subtracting an implicit zp of 8 from the unsigned decoding.

Returns:

SIMD[DType.int8, group_size]: The decoded stored numbers as int8 numbers. These have a zero-point of 0.

decode_fully​

decode_fully(mut self) -> SIMD[float_dtype, group_size]

Decode the stored numbers into floating point representation.

Returns:

SIMD[float_dtype, group_size]: The decoded numbers.

quantize_and_write_to_tensor​

static quantize_and_write_to_tensor[input_rank: Int](input_tt: TileTensor[float_dtype, linear_idx_type=input_tt.linear_idx_type, element_size=input_tt.element_size], output_tt: TileTensor[DType.uint8, linear_idx_type=output_tt.linear_idx_type, element_size=output_tt.element_size], input_shape: IndexList[input_rank])

Encodes the floating point numbers in input_tt along the inner-most dimension and writes the result to output_tt.

Args:

dequantize_and_write_to_tensor​

static dequantize_and_write_to_tensor[output_rank: Int](input_tt: TileTensor[DType.uint8, linear_idx_type=input_tt.linear_idx_type, element_size=input_tt.element_size], output_tt: TileTensor[float_dtype, linear_idx_type=output_tt.linear_idx_type, element_size=output_tt.element_size], output_shape: IndexList[output_rank])

Encodes the floating point numbers in input_tt along the inner-most dimension and writes the result to output_tt.

Args: