Mojo module
fp8_utils
Shared FP8 quantization utilities.
Provides common functions for FP8 scale computation and quantization used across fused normalization kernels and standalone quantization kernels.
NOTE: comm/allreduce_rmsnorm_fp8.mojo inlines copies of these functions to avoid a circular dependency (linalg depends on comm). If you change the logic here, update that copy too. See KERN-2477.
Functionsโ
- โ
compute_dynamic_fp8_scale: Compute dynamic FP8 scale factor and its reciprocal from a row max. - โ
fp8_quantize: Quantize values to FP8, optionally clamping to the representable range.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!