Mojo function
compute_scales_fp8_kernel
compute_scales_fp8_kernel[out_type: DType, scales_type: DType, in_type: DType, input_fn: def[width: Int, alignment: Int](row: Int, col: Int) capturing -> SIMD[in_type, width], num_threads: Int, group_size: Int, simd_width: Int, scales_layout: TensorLayout](scales: TileTensor[scales_type, scales_layout, MutAnyOrigin], scale_ub: Scalar[scales_type])
Compute per-group FP8 scale factors without quantizing.
Each block scans its (row, group) tile via input_fn, computes the
scale factor, and writes it to scales[group_idx, row]. This is
the first half of quantize_fp8_kernel — used by the per-tensor
path so the second kernel can find the tensor-wide max scale.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!