Mojo package
blockwise_fp8_1d2d
Blockwise FP8 1D2D grouped matmul kernel for SM100.
This module provides a structured kernel implementation for grouped blockwise FP8 GEMM using the 1D-1D tensor layout with offset-based addressing.
It combines:
- Accumulation pattern from blockwise_fp8/ (register-based per-K scaling)
- 1D2D work distribution from grouped_block_scaled_1d1d/ (offset-based A tensor addressing, bounds-checked output, 3-warp specialization)
Modulesβ
- β
blockwise_fp8_1d2d_matmul: CPU entrypoint for grouped 1D-1D blockwise FP8 SM100 matmul. - β
blockwise_fp8_1d2d_matmul_kernel: Blockwise FP8 1D2D SM100 matmul kernel. - β
blockwise_fp8_1d2d_smem: Shared memory layout for blockwise FP8 1D2D SM100 matmul.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!