IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

router_group_limited

def router_group_limited[scores_type: DType, bias_type: DType, //, n_routed_experts: Int, n_experts_per_tok: Int, n_groups: Int, topk_group: Int, norm_weights: Bool, target: StringSlice[StaticConstantOrigin], scores_input_fn: OptionalReg[def[width: Int](IndexList[Int(2)]) capturing -> SIMD[scores_type, width]] = None](expert_indices: TileTensor[DType.int32, Storage=expert_indices.Storage, address_space=expert_indices.address_space, linear_idx_type=expert_indices.linear_idx_type, element_size=expert_indices.element_size], expert_weights: TileTensor[scores_type, Storage=expert_weights.Storage, address_space=expert_weights.address_space, linear_idx_type=expert_weights.linear_idx_type, element_size=expert_weights.element_size], expert_scores: TileTensor[scores_type, Storage=expert_scores.Storage, address_space=expert_scores.address_space, linear_idx_type=expert_scores.linear_idx_type, element_size=expert_scores.element_size], expert_bias: TileTensor[bias_type, Storage=expert_bias.Storage, address_space=expert_bias.address_space, linear_idx_type=expert_bias.linear_idx_type, element_size=expert_bias.element_size], routed_scaling_factor: Float32, context: DeviceContext)

A manually fused MoE router with the group-limited strategy.

Reference: https://github.com/deepseek-ai/DeepSeek-V3/blob/9b4e9788e4a3a731f7567338ed15d3ec549ce03b/inference/model.py#L566.

Inputs: expert_indices: The indices of the routed experts for each token. Shape: [num_tokens, num_experts_per_tok]. expert_weights: The weights of the routed experts for each token. Shape: [num_tokens, num_experts_per_tok]. expert_scores: The scores for each expert for each token. Shape: [num_tokens, n_routed_experts]. expert_bias: The bias for each expert. Shape: [n_routed_experts]. routed_scaling_factor: The scaling factor for the routed expert weights. context: The device context.

Parameters:

  • ​scores_type (DType): The data type of the scores and the output weights.
  • ​bias_type (DType): The data type of the expert bias.
  • ​n_routed_experts (Int): The number of experts to route to.
  • ​n_experts_per_tok (Int): The number of experts to be selected per token.
  • ​n_groups (Int): The number of expert groups.
  • ​topk_group (Int): The number of expert groups to be selected per token.
  • ​norm_weights (Bool): Whether to normalize the selected weights.
  • ​target (StringSlice[StaticConstantOrigin]): The target device to run the kernel on.
  • ​scores_input_fn (OptionalReg[def[width: Int](IndexList[Int(2)]) capturing -> SIMD[scores_type, width]]): Input lambda function to load the scores.