Skip to main content

Mojo module

allgather

Multi-GPU allgather implementation that gathers values from multiple GPUs into an output buffer.

This module provides an optimized implementation of allgather operations across multiple GPUs, supporting both peer-to-peer (P2P) and non-P2P communication patterns. The implementation automatically selects between approaches based on hardware capabilities:

  1. P2P-based implementation (when P2P access is available):

    • Uses direct GPU-to-GPU memory access for better performance.
    • Optimized for NVLink and xGMI bandwidth utilization.
    • Uses vectorized memory access.
  2. Non-P2P fallback implementation:

    • Copies data through device memory when direct GPU access isn't possible.
    • Simple but functional approach for systems without P2P support.

Functions

  • allgather: Performs all-gather across GPUs with variadic output.

Was this page helpful?