For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

batched_quantize_dynamic_scaled_fp8

def batched_quantize_dynamic_scaled_fp8[out_dtype: DType, in_dtype: DType, scales_dtype: DType, InputFnType: ImplicitlyCopyable & RegisterPassable & def[width: Int, alignment: Int](batch: Int, row: Int, col: Int) -> SIMD[in_dtype, width], //, group_size_or_per_token: Int, num_cols: Int, pdl_level: PDLLevel = PDLLevel.ON](input_fn: InputFnType, scaled_output: TileTensor[out_dtype, Storage=scaled_output.Storage, address_space=scaled_output.address_space, linear_idx_type=scaled_output.linear_idx_type, element_size=scaled_output.element_size], scales: TileTensor[scales_dtype, Storage=scales.Storage, address_space=scales.address_space, linear_idx_type=scales.linear_idx_type, element_size=scales.element_size], scale_ub: Float32, ctx: DeviceContext, num_rows: Int, batch_size: Int) where (eq InputFnType.in_dtype, in_dtype)

TileTensor primary implementation of batched dynamic scaled FP8 quantization.