Mojo module
fp4_quantization
comptime valuesβ
loggerβ
comptime logger = Logger(stdout, prefix=String(""), source_location=False)
Functionsβ
- β
block_scaled_matmul: - β
block_scaled_matmul_with_epilogue: Our sm100 block scaled matmul kernel still does not support fusion of elementwise operations. This is a temporary implementation that uses our sm100 block scaled matmul kernel and dispatch a separate epilogue kernel to apply the elementwise operations. Callers must allocatec; when anelementwise_lambda_fnis supplied the matmul result is written intocand then read back by the lambda. - β
block_scales_interleave: - β
block_scales_interleave_fp4: - β
block_scales_interleave_fp4_kernel: - β
cast_fp2em1x2_to_bf16x2: - β
dotprod_bf16x2: - β
grouped_matmul_block_scaled_mxfp4: - β
grouped_matmul_block_scaled_mxfp4_kernel: - β
grouped_quantize_dynamic_scaled_fp4_async: - β
grouped_quantize_dynamic_scaled_fp4_async_kernel: - β
matmul_dynamic_block_scaled_mxfp4: - β
matmul_dynamic_block_scaled_mxfp4_kernel: - β
naive_block_scaled_matmul: - β
naive_block_scaled_matmul_kernel: - β
quantize_dynamic_block_scaled: - β
quantize_dynamic_block_scaled_mxfp4: - β
quantize_dynamic_block_scaled_mxfp4_kernel: - β
quantize_dynamic_scaled_async_fp4_kernel: - β
quantize_dynamic_scaled_fp4_async: - β
quantize_dynamic_scaled_fp4fp8: - β
quantize_dynamic_scaled_fp4fp8_kernel: - β
quantize_mxfp4_amd: Quantize BF16 activations to MXFP4 on AMD CDNA4 (MI355X).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!