Mojo module
grouped_1d1d_matmul
CPU entrypoint for grouped 1D-1D block-scaled SM100 matmul.
This module provides the public API for launching the grouped 1D-1D matmul kernel for Mixture of Experts (MoE) layers.
Usage: grouped_matmul_1d1d_nvfp4[transpose_b=True, config=config]( c_tensor, # Output: TileTensor (total_tokens, N) a_tensor, # Input A: TileTensor (total_tokens, K) a_offsets, # Per-expert offsets: TileTensor 1D a_scale_offsets, # Per-expert scale offsets: TileTensor 1D b_tensor, # Weights B: TileTensor (num_experts, N, K) expert_ids, # Active expert IDs: TileTensor 1D a_scales, # Scale factors for A: TileTensor 5D b_scales, # Scale factors for B: TileTensor 6D expert_scales, # Per-expert output scaling: TileTensor 1D num_active_experts, ctx, )
Functionsโ
- โ
grouped_matmul_1d1d_nvfp4: Launch grouped 1D-1D block-scaled matmul kernel. - โ
grouped_matmul_dynamic_scaled_nvfp4: Performs grouped matrix multiplication with NVFP4 quantization.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!