Skip to main content

Mojo package

blockwise_fp8_1d2d

Blockwise FP8 1D2D grouped matmul kernel for SM100.

This module provides a structured kernel implementation for grouped blockwise FP8 GEMM using the 1D-1D tensor layout with offset-based addressing.

It combines:

  • Accumulation pattern from blockwise_fp8/ (register-based per-K scaling)
  • 1D2D work distribution from grouped_block_scaled_1d1d/ (offset-based A tensor addressing, bounds-checked output, 3-warp specialization)

Modules​

Was this page helpful?