Python package

comm

`Allreduce`

class max.nn.comm.Allreduce(num_accelerators)

Layer to perform allreduce operation with automatic implementation selection.

Automatically chooses between peer-to-peer optimized allreduce and naive device-to-device transfer based on accelerator connectivity.

Parameters:: num_accelerators (int) – Number of accelerators participating in the allreduce operation

Initialize the Allreduce layer with a specified number of accelerators.

Parameters:: num_accelerators (int) – Number of accelerators to use for allreduce
Raises:: ValueError – If num_accelerators is less than 1

`devices`

devices: list[Accelerator]

List of accelerators involved in the allreduce operation.

`Signals`

class max.nn.comm.Signals(devices)

Signal buffers used for peer-to-peer communication in allreduce.

Device code uses these buffers by enabling peer-to-peer access. Then thread blocks use the buffers to implement barriers for synchronization, and to hold intermediate communication results.

Parameters:

num_gpus – Number of GPUs involved in the allreduce.
devices (list[DeviceRef])

`NUM_BYTES`

NUM_BYTES = 537919488

The size of the signal buffers used for communication in allreduce.

`buffers()`

buffers()

Allocates and returns buffers used for communication in allreduce.

Synchronizes so that buffers are ready for use when this method returns.

Return type:: list[Tensor]

`devices`

devices: list[DeviceRef]

List of graph devices that these signals communicate between.

`input_types()`

input_types()

Gets graph input types corresponding to these signal buffers.

Return type:: list[BufferType]

Allreduce​

devices​

Signals​

NUM_BYTES​

buffers()​

devices​

input_types()​

`Allreduce`

`devices`

`Signals`

`NUM_BYTES`

`buffers()`

`devices`

`input_types()`