Skip to main content

Python package

comm

Communication primitives for distributed training.

Allreduce

class max.nn.comm.Allreduce(num_accelerators)

source

Layer to perform allreduce operation with automatic implementation selection.

Automatically chooses between peer-to-peer optimized allreduce and naive device-to-device transfer based on accelerator connectivity.

Parameters:

num_accelerators (int) – Number of accelerators participating in the allreduce operation

Initialize the Allreduce layer with a specified number of accelerators.

Parameters:

num_accelerators (int) – Number of accelerators to use for allreduce

Raises:

ValueError – If num_accelerators is less than 1

devices

devices: list[Accelerator]

source

List of accelerators involved in the allreduce operation.

Signals

class max.nn.comm.Signals(devices)

source

Signal buffers used for peer-to-peer communication in allreduce.

Device code uses these buffers by enabling peer-to-peer access. Then thread blocks use the buffers to implement barriers for synchronization, and to hold intermediate communication results.

Args: num_gpus: Number of GPUs involved in the allreduce.

Parameters:

devices (list[DeviceRef])

NUM_BYTES

NUM_BYTES = 537919488

source

The size of the signal buffers used for communication in allreduce.

buffers()

buffers()

source

Allocates and returns buffers used for communication in allreduce.

Enables peer-to-peer access between all GPUs (idempotent) and synchronizes so that buffers are ready for use when this method returns.

Return type:

list[Buffer]

devices

devices: list[DeviceRef]

source

List of graph devices that these signals communicate between.

input_types()

input_types()

source

Gets graph input types corresponding to these signal buffers.

Return type:

list[BufferType]

Was this page helpful?