For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo struct
Struct_ep_dispatch_wait
struct Struct_ep_dispatch_wait
Implemented traitsโ
AnyType,
ImplicitlyDestructible
Methodsโ
executeโ
static def execute[hidden_size: Int, top_k: Int, n_experts: Int, max_token_per_rank: Int, n_gpus_per_node: Int, n_nodes: Int, //, target: StringSlice[StaticConstantOrigin], num_input_tokens: Int = -1](output_tokens: ManagedTensorSlice[Output, static_spec=output_tokens.static_spec], row_offsets: ManagedTensorSlice[Output, static_spec=row_offsets.static_spec], expert_ids: ManagedTensorSlice[Output, static_spec=expert_ids.static_spec], src_info: ManagedTensorSlice[Output, static_spec=src_info.static_spec], atomic_counters: ManagedTensorSlice[MutableInput, static_spec=atomic_counters.static_spec], recv_ptrs: ManagedTensorSlice[Input, static_spec=recv_ptrs.static_spec], recv_count_ptrs: ManagedTensorSlice[Input, static_spec=recv_count_ptrs.static_spec], context: DeviceContext)
Execute the Expert Parallelism dispatch completion kernel. Received tokens are in BF16 format.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!