Mojo module
fp8_quantization
comptime values
logger
comptime logger = Logger[DEFAULT_LEVEL](stdout, "", False)
Functions
-
batched_quantize_dynamic_scaled_fp8: -
batched_quantize_fp8_kernel: -
blockwise_scaled_fp8_with_epilogue: Our sm100 blockwise scaled fp8 matmul kernel still does not support fusion of elementwise operations. This is a temporary implementation that uses our sm100 blockwise scaled fp8 matmul kernel and dispatch a separate epilogue kernel to apply the elementwise operations. For non B200 GPUs, we use the naive blockwise scaled fp8 matmul which support normal epilogue natively. -
convert_e4m3fn_to_e4m3fnuz: Convert E4M3FN weights to E4M3FNUZ format for AMD GPU compatibility. -
matmul_dynamic_scaled_fp8: -
naive_blockwise_scaled_fp8_grouped_matmul: -
naive_blockwise_scaled_fp8_grouped_matmul_kernel: -
naive_blockwise_scaled_fp8_matmul: -
naive_blockwise_scaled_fp8_matmul_kernel: -
quantize_dynamic_scaled_fp8: -
quantize_fp8_kernel: -
quantize_static_scaled_fp8:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!