Skip to main content

Mojo module

grouped_block_scaled_smem

Shared memory layout for grouped block-scaled SM100 matmul.

Extends BlockScaledSmem with tensormap descriptor storage for dynamic updates. Used by GroupedBlockScaledMatmulKernel for grouped GEMM with variable problem sizes.

Additional SMEM allocations:

  • 5 TMA descriptors (A, B, SFA, SFB, C) at 128 bytes each = 640 bytes total
  • Aligned to 128 bytes for TMA descriptor requirements

comptime values

NUM_GROUPED_TENSORMAPS

comptime NUM_GROUPED_TENSORMAPS = 5

TMA_DESCRIPTOR_BYTES

comptime TMA_DESCRIPTOR_BYTES = 128

Structs

Was this page helpful?