IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo package

blockwise_fp8_1d2d

Blockwise FP8 1D2D grouped matmul kernel for SM100.

This module provides a structured kernel implementation for grouped blockwise FP8 GEMM using the 1D-1D tensor layout with offset-based addressing.

It combines:

  • Accumulation pattern from blockwise_fp8/ (register-based per-K scaling)
  • 1D2D work distribution from grouped_block_scaled_1d1d/ (offset-based A tensor addressing, bounds-checked output, 3-warp specialization)

Modules​