Mojo function
qmatmul
qmatmul[encoding: QuantizationEncoding](lhs: Symbol, rhs: Symbol) -> Symbol
Performs matrix multiplication between floating point and quantized tensors.
This quantizes the lhs
floating point value to match the encoding of the
rhs
quantized value, performs matmul, and then dequantizes the result.
The operation expects a transposed rhs
argument, which differs from
conventional matrix multiplication.
For matrix shapes:
- Where standard
matmul()
expects shapes($m x $n) @ ($n x $p) → ($m x $p)
qmatmul()
expects shapes($m x $n) @ ($p x $n) → ($m x $p)
For example, given:
- lhs shape: [32, 64]
- rhs shape: [32, 64] (transposed)
- output shape: [32, 32]
The operation can be expressed as:
dequantize(quantize(lhs) . transpose(rhs))
dequantize(quantize(lhs) . transpose(rhs))
Where .
is a normal matmul operator.
The last two dimensions in lhs
are treated as matrices and multiplied
by rhs
(which must be a 2D tensor). Any remaining dimensions in lhs
are broadcast dimensions.
NOTE: Currently this supports Q4_0, Q4_K, and Q6_K encodings only.
Parameters:
- encoding (
QuantizationEncoding
): The quantization encoding to use.
Args:
- lhs (
Symbol
): The non-quantized, left-hand-side of the matmul. - rhs (
Symbol
): The transposed and quantized right-hand-side of the matmul. Must be rank 2 (a 2D tensor/matrix) and in a supported quantization encoding.
Returns:
The dequantized result (a floating point tensor).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!