Mojo function
group_limited_router_kernel
group_limited_router_kernel[scores_type: DType, bias_type: DType, expert_indices_layout: Layout, expert_weights_layout: Layout, expert_scores_layout: Layout, expert_bias_layout: Layout, n_routed_experts: Int, n_experts_per_tok: Int, n_groups: Int, topk_group: Int, norm_weights: Bool, num_threads: Int](expert_indices: LayoutTensor[DType.int32, expert_indices_layout, MutAnyOrigin], expert_weights: LayoutTensor[scores_type, expert_weights_layout, MutAnyOrigin], expert_scores: LayoutTensor[scores_type, expert_scores_layout, ImmutAnyOrigin], expert_bias: LayoutTensor[bias_type, expert_bias_layout, ImmutAnyOrigin], routed_scaling_factor: Float32)
A manually fused MoE router with the group-limited strategy. It divides all the experts into n_groups groups and then finds the top topk_group groups with the highest scores. The final experts for each token are selected from the experts in the selected groups. The bias will be applied to the scores during the selection process, but the final weights will not include the bias.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!