Skip to main content
Log in

Mojo struct

Float32Encoding

The float32 quantization encoding.

This encoding is essentially an identity operation. It exists in order to be a default case for code that is generic over quantization encoding.

Implemented traits

AnyType, QuantizationEncoding

Methods

quantize

static quantize(_tensor: Tensor[float32]) -> Tensor[uint8]

Unimplemented quantize method for float32.

Since float32 is an identity encoding, it shouldn't define a quantize method. In particular, float32 values should be used with non-quantized ops, which expect dtype float32. This is in contrast to quantized ops, which expect dtype uint8 operands. So raise an exception here to avoid accidental bugs.

id

static id() -> String

Identifier for the float32 quantized encoding.

Was this page helpful?