Skip to main content

Python package

comm

Allreduce

class max.nn.comm.Allreduce(num_accelerators)

Layer to perform allreduce operation with automatic implementation selection.

Automatically chooses between peer-to-peer optimized allreduce and naive device-to-device transfer based on accelerator connectivity.

Parameters:

num_accelerators (int) – Number of accelerators participating in the allreduce operation

Initialize the Allreduce layer with a specified number of accelerators.

Parameters:

num_accelerators (int) – Number of accelerators to use for allreduce

Raises:

ValueError – If num_accelerators is less than 1

devices

devices: list[Accelerator]

List of accelerators involved in the allreduce operation.

Signals

class max.nn.comm.Signals(devices)

Signal buffers used for peer-to-peer communication in allreduce.

Device code uses these buffers by enabling peer-to-peer access. Then thread blocks use the buffers to implement barriers for synchronization, and to hold intermediate communication results.

Parameters:

  • num_gpus – Number of GPUs involved in the allreduce.
  • devices (list[DeviceRef])

NUM_BYTES

NUM_BYTES = 537919488

The size of the signal buffers used for communication in allreduce.

buffers()

buffers()

Allocates and returns buffers used for communication in allreduce.

Return type:

list[Tensor]

devices

devices: list[DeviceRef]

List of graph devices that these signals communicate between.

input_types()

input_types()

Gets graph input types corresponding to these signal buffers.

Return type:

list[BufferType]

Was this page helpful?