Skip to main content

Mojo module

grouped_1d1d_smem

Shared memory layout for grouped 1D-1D block-scaled SM100 matmul.

This is a simplified SMEM structure for the 1D-1D kernel variant that uses offset-based addressing instead of pointer-per-group. Key differences from the standard GroupedBlockScaledSmem:

  1. No tensormap descriptors - TMAs are grid-constant (not updated per-group)
  2. No CLC pipeline storage - uses 3-warp specialization (no scheduler warp)
  3. Simpler barrier structure optimized for the 1D-1D workload

Tile storage is shared via BlockScaledTileCore from block_scaled_smem.mojo.

Structs

  • Grouped1D1DSmem: SMEM struct for grouped 1D-1D block-scaled GEMM without CLC scheduler.

Was this page helpful?