Mojo module

grouped_1d1d_smem

Shared memory layout for grouped 1D-1D block-scaled SM100 matmul.

This is a simplified SMEM structure for the 1D-1D kernel variant that uses offset-based addressing instead of pointer-per-group. Key differences from the standard GroupedBlockScaledSmem:

No tensormap descriptors - TMAs are grid-constant (not updated per-group)
No CLC pipeline storage - uses 3-warp specialization (no scheduler warp)
Simpler barrier structure optimized for the 1D-1D workload

Tile storage is shared via BlockScaledTileCore from block_scaled_smem.mojo.

Structs

Grouped1D1DSmem: SMEM struct for grouped 1D-1D block-scaled GEMM without CLC scheduler.

Structs​

Structs