For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo struct
Struct_ep_dispatch_async
struct Struct_ep_dispatch_async
Implemented traitsโ
AnyType,
ImplicitlyDestructible
Methodsโ
executeโ
static def execute[input_dtype: DType, dispatch_dtype: DType, hidden_size: Int, top_k: Int, n_experts: Int, max_token_per_rank: Int, n_gpus_per_node: Int, n_nodes: Int, dispatch_fmt_str: StringSlice[StaticConstantOrigin], //, target: StringSlice[StaticConstantOrigin]](atomic_counters: ManagedTensorSlice[MutableInput, static_spec=atomic_counters.static_spec], input_tokens: ManagedTensorSlice[Input, static_spec=input_tokens.static_spec], topk_ids: ManagedTensorSlice[Input, static_spec=topk_ids.static_spec], send_ptrs: ManagedTensorSlice[Input, static_spec=send_ptrs.static_spec], recv_ptrs: ManagedTensorSlice[Input, static_spec=recv_ptrs.static_spec], recv_count_ptrs: ManagedTensorSlice[Input, static_spec=recv_count_ptrs.static_spec], context: DeviceContext)
Execute the Expert Parallelism async dispatch kernel. Tokens are transferred in either Blockwise FP8 or BF16 format.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!