For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

fp4_quantization

`comptime` values

`logger`

comptime logger = Logger(stdout, prefix=String(""), source_location=False)

Functions

block_scaled_matmul:
block_scaled_matmul_with_epilogue: Our sm100 block scaled matmul kernel still does not support fusion of elementwise operations. This is a temporary implementation that uses our sm100 block scaled matmul kernel and dispatch a separate epilogue kernel to apply the elementwise operations. Callers must allocate c; when an elementwise_lambda_fn is supplied the matmul result is written into c and then read back by the lambda.
block_scales_interleave:
block_scales_interleave_fp4:
block_scales_interleave_fp4_kernel:
cast_fp2em1x2_to_bf16x2:
dotprod_bf16x2:
grouped_matmul_block_scaled_mxfp4:
grouped_matmul_block_scaled_mxfp4_kernel:
grouped_quantize_dynamic_scaled_fp4_async:
grouped_quantize_dynamic_scaled_fp4_async_kernel:
matmul_dynamic_block_scaled_mxfp4:
matmul_dynamic_block_scaled_mxfp4_kernel:
naive_block_scaled_matmul:
naive_block_scaled_matmul_kernel:
quantize_dynamic_block_scaled:
quantize_dynamic_block_scaled_mxfp4:
quantize_dynamic_block_scaled_mxfp4_kernel:
quantize_dynamic_scaled_async_fp4_kernel:
quantize_dynamic_scaled_fp4_async:
quantize_dynamic_scaled_fp4fp8:
quantize_dynamic_scaled_fp4fp8_kernel:
quantize_mxfp4_amd: Quantize BF16 activations to MXFP4 on AMD CDNA4 (MI355X).

comptime values​

logger​

Functions​

`comptime` values

`logger`

Functions