Skip to main content

/

Mojo package

blockwise_fp8_1d2d

Blockwise FP8 1D2D grouped matmul kernel for SM100.

This module provides a structured kernel implementation for grouped blockwise FP8 GEMM using the 1D-1D tensor layout with offset-based addressing.

It combines:

Accumulation pattern from blockwise_fp8/ (register-based per-K scaling)
1D2D work distribution from grouped_block_scaled_1d1d/ (offset-based A tensor addressing, bounds-checked output, 3-warp specialization)

Modules

blockwise_fp8_1d2d_matmul: CPU entrypoint for grouped 1D2D blockwise FP8 SM100 matmul.
blockwise_fp8_1d2d_matmul_kernel: Blockwise FP8 1D2D SM100 matmul kernel.
blockwise_fp8_1d2d_smem: Shared memory layout for blockwise FP8 1D2D SM100 matmul.

Modules

View source

View source

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!