IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

combine_async_kernel

combine_async_kernel[input_type: DType, num_threads: Int, input_tokens_layout: TensorLayout, src_info_layout: TensorLayout, n_sms: Int, top_k: Int, n_experts: Int, n_ranks: Int, msg_bytes: Int, max_tokens_per_rank: Int, p2p_world_size: Int, use_shmem: Bool = True](input_tokens: TileTensor[input_type, input_tokens_layout, ImmutExternalOrigin], src_info: TileTensor[DType.int32, src_info_layout, ImmutExternalOrigin], send_buf_p: UnsafePointer[UInt8, MutExternalOrigin], recv_buf_ptrs: InlineArray[UnsafePointer[UInt8, MutExternalOrigin], p2p_world_size], recv_count_ptrs: InlineArray[UnsafePointer[UInt64, MutExternalOrigin], p2p_world_size], ep_counters: EPLocalSyncCounters[n_experts], my_rank: Int32)

Send tokens to the original rank based on the src_info tensor. This kernel utilizes the non-blocking SHMEM API, and would return immediately after initiating the communication. The communication is considered complete after calling the combine_wait_kernel.

Parameters:

  • ​input_type (DType): The type of the input tokens.
  • ​num_threads (Int): The number of threads in the block.
  • ​input_tokens_layout (TensorLayout): The layout of the input tokens.
  • ​src_info_layout (TensorLayout): The layout of the source token info.
  • ​n_sms (Int): The total number of SMs in the device.
  • ​top_k (Int): The number of selected experts per token.
  • ​n_experts (Int): The total number of experts in the model.
  • ​n_ranks (Int): The number of all devices participating in the communication.
  • ​msg_bytes (Int): This is the total number of bytes we need to send for each token.
  • ​max_tokens_per_rank (Int): The maximum number of tokens per rank.
  • ​p2p_world_size (Int): Size of a High-speed GPU interconnect group.
  • ​use_shmem (Bool): Whether to use the SHMEM API for the communication.

Args: