Skip to main content

Mojo module

grouped_1d1d_matmul_kernel

Grouped 1D-1D block-scaled SM100 matmul kernel.

This kernel implements grouped GEMM for Mixture of Experts (MoE) layers using the 1D-1D tensor layout with offset-based addressing.

Key characteristics:

  • 3-warp specialization (Load, MMA, Epilogue) - no scheduler warp
  • Grid-constant TMA descriptors (no runtime tensormap updates)
  • Offset-based addressing via a_offsets for contiguous token buffers
  • Per-expert output scaling via expert_scales tensor

Architecture:

  • TMA warp: Loads A, B, SFA, SFB tiles using grid-constant TMAs
  • MMA warp: Executes block-scaled matrix multiply
  • Epilogue warps: Stores results with expert_scale applied

This is a port of grouped_matmul_sm100_1d1d.mojo to the structured kernels architecture.

Structsโ€‹

Was this page helpful?