For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo module

allgather

Multi-GPU allgather implementation that gathers values from multiple GPUs into an output buffer.

This module provides an optimized implementation of allgather operations across multiple GPUs, supporting both peer-to-peer (P2P) and non-P2P communication patterns. The implementation automatically selects between approaches based on hardware capabilities:

P2P-based implementation (when P2P access is available):
- Uses direct GPU-to-GPU memory access for better performance.
- Optimized for NVLink and xGMI bandwidth utilization.
- Uses vectorized memory access.
Non-P2P fallback implementation:
- Copies data through device memory when direct GPU access isn't possible.
- Simple but functional approach for systems without P2P support.

`comptime` values

`allgather_tuning_table`

comptime allgather_tuning_table = Table(List(DefaultCommTuningConfig(Int(-1), Int(-1), StringSlice("sm_90a"), Int(216)), DefaultCommTuningConfig(Int(-1), Int(-1), StringSlice("sm_100a"), Int(512)), DefaultCommTuningConfig(Int(-1), Int(-1), StringSlice("sm_103a"), Int(512)), DefaultCommTuningConfig(Int(-1), Int(-1), StringSlice("CDNA4"), Int(216)), DefaultCommTuningConfig(Int(-1), Int(-1), StringSlice("default"), Int(512)), __list_literal__=NoneType(None)), String("allgather_table"))

Functions

allgather: Per-device all-gather: one instance per GPU builds its own outputs.

comptime values​

allgather_tuning_table​

Functions​

`comptime` values

`allgather_tuning_table`

Functions