For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo package

blockwise_fp8_1d2d

Blockwise FP8 1D2D grouped matmul kernel for SM100.

This module provides a structured kernel implementation for grouped blockwise FP8 GEMM using the 1D-1D tensor layout with offset-based addressing.

It combines:

Accumulation pattern from blockwise_fp8/ (register-based per-K scaling)
1D2D work distribution from grouped_block_scaled_1d1d/ (offset-based A tensor addressing, bounds-checked output, 3-warp specialization)

Modules

blockwise_fp8_1d2d_matmul: CPU entrypoint for grouped 1D2D blockwise FP8 SM100 matmul.
blockwise_fp8_1d2d_matmul_kernel: Blockwise FP8 1D2D SM100 matmul kernel.
blockwise_fp8_1d2d_smem: Shared memory layout for blockwise FP8 1D2D SM100 matmul.

Modules​

Modules