For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

router_group_limited

def router_group_limited[scores_type: DType, bias_type: DType, //, n_routed_experts: Int, n_experts_per_tok: Int, n_groups: Int, topk_group: Int, norm_weights: Bool, target: StringSlice[StaticConstantOrigin], scores_input_fn: OptionalReg[def[width: Int](IndexList[Int(2)]) capturing -> SIMD[scores_type, width]] = None](expert_indices: TileTensor[DType.int32, Storage=expert_indices.Storage, address_space=expert_indices.address_space, linear_idx_type=expert_indices.linear_idx_type, element_size=expert_indices.element_size], expert_weights: TileTensor[scores_type, Storage=expert_weights.Storage, address_space=expert_weights.address_space, linear_idx_type=expert_weights.linear_idx_type, element_size=expert_weights.element_size], expert_scores: TileTensor[scores_type, Storage=expert_scores.Storage, address_space=expert_scores.address_space, linear_idx_type=expert_scores.linear_idx_type, element_size=expert_scores.element_size], expert_bias: TileTensor[bias_type, Storage=expert_bias.Storage, address_space=expert_bias.address_space, linear_idx_type=expert_bias.linear_idx_type, element_size=expert_bias.element_size], routed_scaling_factor: Float32, context: DeviceContext)

A manually fused MoE router with the group-limited strategy.

Reference: https://github.com/deepseek-ai/DeepSeek-V3/blob/9b4e9788e4a3a731f7567338ed15d3ec549ce03b/inference/model.py#L566.

Inputs: expert_indices: The indices of the routed experts for each token. Shape: [num_tokens, num_experts_per_tok]. expert_weights: The weights of the routed experts for each token. Shape: [num_tokens, num_experts_per_tok]. expert_scores: The scores for each expert for each token. Shape: [num_tokens, n_routed_experts]. expert_bias: The bias for each expert. Shape: [n_routed_experts]. routed_scaling_factor: The scaling factor for the routed expert weights. context: The device context.

Parameters:

scores_type (DType): The data type of the scores and the output weights.
bias_type (DType): The data type of the expert bias.
n_routed_experts (Int): The number of experts to route to.
n_experts_per_tok (Int): The number of experts to be selected per token.
n_groups (Int): The number of expert groups.
topk_group (Int): The number of expert groups to be selected per token.
norm_weights (Bool): Whether to normalize the selected weights.
target (StringSlice[StaticConstantOrigin]): The target device to run the kernel on.
scores_input_fn (OptionalReg[def[width: Int](IndexList[Int(2)]) capturing -> SIMD[scores_type, width]]): Input lambda function to load the scores.