For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo struct
Struct_ep_dispatch_mxfp4
struct Struct_ep_dispatch_mxfp4
Implemented traitsโ
AnyType,
ImplicitlyDestructible
Methodsโ
executeโ
static def execute[input_dtype: DType, dispatch_dtype: DType, dispatch_scale_dtype: DType, hidden_size: Int, top_k: Int, n_experts: Int, max_token_per_rank: Int, n_gpus_per_node: Int, n_nodes: Int, fused_shared_expert: Bool, skip_a2a: Bool, allreduce_world_size: Int, //, target: StringSlice[StaticConstantOrigin]](output_tokens: ManagedTensorSlice[Output, static_spec=output_tokens.static_spec], output_scales: ManagedTensorSlice[Output, static_spec=output_scales.static_spec], row_offsets: ManagedTensorSlice[Output, static_spec=row_offsets.static_spec], expert_ids: ManagedTensorSlice[Output, static_spec=expert_ids.static_spec], src_info: ManagedTensorSlice[Output, static_spec=src_info.static_spec], atomic_counters: ManagedTensorSlice[MutableInput, static_spec=atomic_counters.static_spec], input_tokens: ManagedTensorSlice[Input, static_spec=input_tokens.static_spec], topk_ids: ManagedTensorSlice[Input, static_spec=topk_ids.static_spec], send_ptrs: ManagedTensorSlice[Input, static_spec=send_ptrs.static_spec], recv_ptrs: ManagedTensorSlice[Input, static_spec=recv_ptrs.static_spec], recv_count_ptrs: ManagedTensorSlice[Input, static_spec=recv_count_ptrs.static_spec], context: DeviceContext)
Execute the fused Expert Parallelism MXFP4 dispatch kernel. Tokens are dispatched in MXFP4 format.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!