Skip to main content

Python package

comm

Allreduce

class max.nn.comm.Allreduce(num_accelerators)

Layer to perform allreduce operation with automatic implementation selection.

Automatically chooses between peer-to-peer optimized allreduce and naive device-to-device transfer based on accelerator connectivity.

Parameters:

num_accelerators (int) – Number of accelerators participating in the allreduce operation

Initialize the Allreduce layer with a specified number of accelerators.

Parameters:

num_accelerators (int) – Number of accelerators to use for allreduce

Raises:

ValueError – If num_accelerators is less than 1

devices

devices: list[Accelerator]

List of accelerators involved in the allreduce operation.

Signals

class max.nn.comm.Signals(devices)

Signal buffers used for peer-to-peer communication in allreduce.

Device code uses these buffers by enabling peer-to-peer access. Then thread blocks use the buffers to implement barriers for synchronization, and to hold intermediate communication results.

Parameters:

  • num_gpus – Number of GPUs involved in the allreduce.
  • devices (list[DeviceRef])

NUM_BYTES

NUM_BYTES = 537919488

The size of the signal buffers used for communication in allreduce.

buffers()

buffers()

Allocates and returns buffers used for communication in allreduce.

Synchronizes so that buffers are ready for use when this method returns.

Return type:

list[Tensor]

devices

devices: list[DeviceRef]

List of graph devices that these signals communicate between.

input_types()

input_types()

Gets graph input types corresponding to these signal buffers.

Return type:

list[BufferType]

Was this page helpful?