Skip to main content

Mojo module

grouped_block_scaled_smem

Shared memory layout for grouped block-scaled SM100 matmul.

Extends BlockScaledTileCore with tensormap descriptor storage for dynamic updates. Used by GroupedBlockScaledMatmulKernel for grouped GEMM with variable problem sizes.

Additional SMEM allocations:

  • 5 TMA descriptors (A, B, SFA, SFB, C) at 128 bytes each = 640 bytes total
  • Aligned to 128 bytes for TMA descriptor requirements

Tile storage is shared via BlockScaledTileCore from block_scaled_smem.mojo.

comptime values

NUM_GROUPED_TENSORMAPS

comptime NUM_GROUPED_TENSORMAPS = 5

TMA_DESCRIPTOR_BYTES

comptime TMA_DESCRIPTOR_BYTES = 128

Structs

Was this page helpful?