Mojo function
batched_quantize_dynamic_scaled_fp8
batched_quantize_dynamic_scaled_fp8[out_dtype: DType, in_dtype: DType, scales_dtype: DType, //, input_fn: def[width: Int, alignment: Int](batch: Int, row: Int, col: Int) capturing -> SIMD[in_dtype, width], group_size_or_per_token: Int, num_cols: Int, pdl_level: PDLLevel = PDLLevel()](scaled_output: TileTensor[out_dtype, scaled_output.LayoutType, scaled_output.origin, address_space=scaled_output.address_space, linear_idx_type=scaled_output.linear_idx_type, element_size=scaled_output.element_size], scales: TileTensor[scales_dtype, scales.LayoutType, scales.origin, address_space=scales.address_space, linear_idx_type=scales.linear_idx_type, element_size=scales.element_size], scale_ub: Float32, ctx: DeviceContext, num_rows: Int, batch_size: Int)
TileTensor primary implementation of batched dynamic scaled FP8 quantization.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!