Mojo module
grouped_1d1d_matmul_kernel
Grouped 1D-1D block-scaled SM100 matmul kernel.
This kernel implements grouped GEMM for Mixture of Experts (MoE) layers using the 1D-1D tensor layout with offset-based addressing.
Key characteristics:
- 3-warp specialization (Load, MMA, Epilogue) - no scheduler warp
- Grid-constant TMA descriptors (no runtime tensormap updates)
- Offset-based addressing via a_offsets for contiguous token buffers
- Per-expert output scaling via expert_scales tensor
Architecture:
- TMA warp: Loads A, B, SFA, SFB tiles using grid-constant TMAs
- MMA warp: Executes block-scaled matrix multiply
- Epilogue warps: Stores results with expert_scale applied
This is a port of grouped_matmul_sm100_1d1d.mojo to the structured kernels architecture.
Structsโ
- โ
Grouped1D1DMatmulKernel: Grouped 1D-1D block-scaled matmul kernel. - โ
WarpRole1D1D: Warp role for 1D-1D kernel with 3-warp specialization.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!