Mojo module
fp4_quantization
comptime values
logger
comptime logger = Logger(stdout, "", False)
Functions
-
block_scaled_matmul: NDBuffer overload ofblock_scaled_matmul. Converts to TileTensor and delegates. -
block_scaled_matmul_with_epilogue: Our sm100 block scaled matmul kernel still does not support fusion of elementwise operations. This is a temporary implementation that uses our sm100 block scaled matmul kernel and dispatch a separate epilogue kernel to apply the elementwise operations. -
block_scales_interleave: NDBuffer overload ofblock_scales_interleave. Converts to TileTensor and delegates. -
block_scales_interleave_fp4: -
block_scales_interleave_fp4_kernel: -
naive_block_scaled_matmul: -
naive_block_scaled_matmul_kernel: -
quantize_dynamic_block_scaled: NDBuffer overload ofquantize_dynamic_block_scaled. Converts to TileTensor and delegates. -
quantize_dynamic_scaled_async_fp4_kernel: -
quantize_dynamic_scaled_fp4_async: -
quantize_dynamic_scaled_fp4fp8: -
quantize_dynamic_scaled_fp4fp8_kernel:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!