Mojo module

blockwise_fp8_1d2d_smem

Shared memory layout for blockwise FP8 1D2D SM100 matmul.

This is a simplified SMEM structure for the 1D2D blockwise FP8 kernel that uses offset-based addressing (GroupedWorkIterator1D1D). Key differences from the standard BlockwiseFP8Smem:

No CLC pipeline storage - uses 3-warp specialization (no scheduler warp)
Uses SmemPipelineBundleNoClc instead of SmemPipelineBundle
Otherwise identical tile storage (A, B, C, A-scales)

Tile storage is shared via BlockwiseFP8TileCore from blockwise_fp8_smem.mojo.

Structs

BlockwiseFP8_1D2DSmem: SMEM struct for blockwise FP8 1D2D matmul without CLC scheduler.

Structs​

Structs