Mojo module

grouped_block_scaled_smem

Shared memory layout for grouped block-scaled SM100 matmul.

Extends BlockScaledTileCore with tensormap descriptor storage for dynamic updates. Used by GroupedBlockScaledMatmulKernel for grouped GEMM with variable problem sizes.

Additional SMEM allocations:

5 TMA descriptors (A, B, SFA, SFB, C) at 128 bytes each = 640 bytes total
Aligned to 128 bytes for TMA descriptor requirements

Tile storage is shared via BlockScaledTileCore from block_scaled_smem.mojo.

`comptime` values

`NUM_GROUPED_TENSORMAPS`

comptime NUM_GROUPED_TENSORMAPS = 5

`TMA_DESCRIPTOR_BYTES`

comptime TMA_DESCRIPTOR_BYTES = 128

Structs

GroupedBlockScaledSmem: SMEM struct for grouped block-scaled GEMM.

comptime values​

NUM_GROUPED_TENSORMAPS​

TMA_DESCRIPTOR_BYTES​

Structs​

`comptime` values

`NUM_GROUPED_TENSORMAPS`

`TMA_DESCRIPTOR_BYTES`

Structs