Skip to main content

Mojo package

grouped_block_scaled_1d1d

Grouped block-scaled matmul with 1D-1D tensor layout for SM100.

This module provides a structured kernel implementation for grouped GEMM operations in Mixture of Experts (MoE) layers, using contiguous token buffers with offset-based addressing (the "1D-1D" layout).

Key characteristics:

  • A tensor: Contiguous (total_tokens, K) with a_offsets for per-group access
  • B tensor: Batched (num_experts, N, K) weights
  • C tensor: Contiguous (total_tokens, N) output
  • Per-expert output scaling via expert_scales tensor

This is a port of max/kernels/src/linalg/grouped_matmul_sm100_1d1d.mojo to the structured kernels architecture.

See PORTING_PLAN.md for implementation details.

Modules

Was this page helpful?