IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

scatter

Multi-GPU scatter+broadcast kernel implementation.

Distributes different data chunks from a root GPU to multiple device groups. Each group (DP replica) gets a different chunk, and all devices within a group (TP devices) get the same chunk.

Example with DP=4, TP=2, 8 GPUs:

  • Chunk 0 -> GPU 0 and GPU 1 (Replica A)
  • Chunk 1 -> GPU 2 and GPU 3 (Replica B)
  • Chunk 2 -> GPU 4 and GPU 5 (Replica C)
  • Chunk 3 -> GPU 6 and GPU 7 (Replica D)

Uses a pull-based approach: each GPU reads its chunk from root via P2P.

Functionsโ€‹