Python package
comm
Communication primitives for distributed training.
Allreduce
class max.nn.comm.Allreduce(num_accelerators)
Layer to perform allreduce operation with automatic implementation selection.
Automatically chooses between peer-to-peer optimized allreduce and naive device-to-device transfer based on accelerator connectivity.
-
Parameters:
-
num_accelerators (int) – Number of accelerators participating in the allreduce operation
Initialize the Allreduce layer with a specified number of accelerators.
-
Parameters:
-
num_accelerators (int) – Number of accelerators to use for allreduce
-
Raises:
-
ValueError – If num_accelerators is less than 1
devices
devices: list[Accelerator]
List of accelerators involved in the allreduce operation.
Signals
class max.nn.comm.Signals(devices)
Signal buffers used for peer-to-peer communication in allreduce.
Device code uses these buffers by enabling peer-to-peer access. Then thread blocks use the buffers to implement barriers for synchronization, and to hold intermediate communication results.
Args: num_gpus: Number of GPUs involved in the allreduce.
NUM_BYTES
NUM_BYTES = 537919488
The size of the signal buffers used for communication in allreduce.
buffers()
buffers()
Allocates and returns buffers used for communication in allreduce.
Enables peer-to-peer access between all GPUs (idempotent) and synchronizes so that buffers are ready for use when this method returns.
devices
List of graph devices that these signals communicate between.
input_types()
input_types()
Gets graph input types corresponding to these signal buffers.
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!