IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo package

grouped_block_scaled_1d1d

Grouped block-scaled matmul with 1D-1D tensor layout for SM100.

This module provides a structured kernel implementation for grouped GEMM operations in Mixture of Experts (MoE) layers, using contiguous token buffers with offset-based addressing (the "1D-1D" layout).

Key characteristics:

  • A tensor: Contiguous (total_tokens, K) with a_offsets for per-group access
  • B tensor: Batched (num_experts, N, K) weights
  • C tensor: Contiguous (total_tokens, N) output
  • Per-expert output scaling via expert_scales tensor

This is a port of max/kernels/src/linalg/grouped_matmul_sm100_1d1d.mojo to the structured kernels architecture.

See PORTING_PLAN.md for implementation details.

Modules​