Skip to main content

Mojo module

blockwise_fp8_1d2d_smem

Shared memory layout for blockwise FP8 1D2D SM100 matmul.

This is a simplified SMEM structure for the 1D2D blockwise FP8 kernel that uses offset-based addressing (GroupedWorkIterator1D1D). Key differences from the standard BlockwiseFP8Smem:

  1. No CLC pipeline storage - uses 3-warp specialization (no scheduler warp)
  2. Uses SmemPipelineBundleNoClc instead of SmemPipelineBundle
  3. Otherwise identical tile storage (A, B, C, A-scales)

Tile storage is shared via BlockwiseFP8TileCore from blockwise_fp8_smem.mojo.

Structs

Was this page helpful?