Mojo function

group_limited_router_kernel

group_limited_router_kernel[scores_type: DType, bias_type: DType, ExpertIndicesLayoutType: TensorLayout, ExpertWeightsLayoutType: TensorLayout, ExpertScoresLayoutType: TensorLayout, ExpertBiasLayoutType: TensorLayout, n_routed_experts: Int, n_experts_per_tok: Int, n_groups: Int, topk_group: Int, norm_weights: Bool, num_threads: Int, scores_input_fn: OptionalReg[fn[width: Int](IndexList[2]) capturing -> SIMD[scores_type, width]] = None](expert_indices: TileTensor[DType.int32, ExpertIndicesLayoutType, MutAnyOrigin], expert_weights: TileTensor[scores_type, ExpertWeightsLayoutType, MutAnyOrigin], expert_scores: TileTensor[scores_type, ExpertScoresLayoutType, ImmutAnyOrigin], expert_bias: TileTensor[bias_type, ExpertBiasLayoutType, ImmutAnyOrigin], routed_scaling_factor: Float32)

A manually fused MoE router with the group-limited strategy. It divides all the experts into n_groups groups and then finds the top topk_group groups with the highest scores. The final experts for each token are selected from the experts in the selected groups. The bias will be applied to the scores during the selection process, but the final weights will not include the bias.