Mojo module
fp8_quantization
comptime valuesβ
loggerβ
comptime logger = Logger(stdout, prefix=String(""), source_location=False)
Functionsβ
- β
batched_quantize_dynamic_scaled_fp8: TileTensor primary implementation of batched dynamic scaled FP8 quantization. - β
batched_quantize_fp8_kernel: - β
blockwise_scaled_fp8_with_epilogue: Our sm100 blockwise scaled fp8 matmul kernel still does not support fusion of elementwise operations. This is a temporary implementation that uses our sm100 blockwise scaled fp8 matmul kernel and dispatch a separate epilogue kernel to apply the elementwise operations. For non B200 GPUs, we use the naive blockwise scaled fp8 matmul which support normal epilogue natively. Callers must allocatec; when anelementwise_lambda_fnis supplied the matmul result is written intocand then read back by the lambda. - β
compute_scales_fp8_kernel: Compute per-group FP8 scale factors without quantizing. - β
convert_e4m3fn_to_e4m3fnuz: Convert E4M3FN weights to E4M3FNUZ format for AMD GPU compatibility. - β
convert_kernel_unified: - β
matmul_dynamic_scaled_fp8: TileTensor primary implementation of dynamic scaled FP8 matmul. - β
max_reduction_scale_kernel: Per-row strided max-|x| reduction into a global FP8 scale. - β
naive_blockwise_scaled_fp8_grouped_matmul: - β
naive_blockwise_scaled_fp8_grouped_matmul_kernel: - β
naive_blockwise_scaled_fp8_matmul: - β
naive_blockwise_scaled_fp8_matmul_kernel: - β
quantize_dynamic_scaled_fp8: TileTensor primary implementation of dynamic scaled FP8 quantization. - β
quantize_fp8_kernel: - β
quantize_fp8_kernel_per_tensor: Per-tensor FP8 quantize kernel. - β
quantize_static_scaled_fp8: TileTensor implementation of static scaled FP8 quantization. - β
quantize_tensor_dynamic_scaled_fp8: TileTensor primary implementation of dynamic scaled FP8 quantization. - β
scaled_fp8_quant_unified: - β
zero_scale_global_kernel:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!