Mojo module
grouped_block_scaled_smem
Shared memory layout for grouped block-scaled SM100 matmul.
Extends BlockScaledSmem with tensormap descriptor storage for dynamic updates. Used by GroupedBlockScaledMatmulKernel for grouped GEMM with variable problem sizes.
Additional SMEM allocations:
- 5 TMA descriptors (A, B, SFA, SFB, C) at 128 bytes each = 640 bytes total
- Aligned to 128 bytes for TMA descriptor requirements
comptime values
NUM_GROUPED_TENSORMAPS
comptime NUM_GROUPED_TENSORMAPS = 5
TMA_DESCRIPTOR_BYTES
comptime TMA_DESCRIPTOR_BYTES = 128
Structs
-
GroupedBlockScaledSmem: SMEM struct for grouped block-scaled GEMM.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!