Skip to main content

Python module

driver

Exposes APIs for interacting with hardware, such as allocating tensors on a GPU and moving tensors between the CPU and GPU. It provides interfaces for memory management, device properties, and hardware monitoring. Through these APIs, you can control data placement, track resource utilization, and configure device settings for optimal performance.

For example, you can use the following code to use an accelerator if one is available, otherwise use the CPU:

from max import driver

device = driver.CPU() if driver.accelerator_count() == 0 else driver.Accelerator()
print(f"Using {device} device")

Accelerator

class max.driver.Accelerator(self, id: int = -1, device_memory_limit: int = -1)

Creates an accelerator device with the specified ID and memory limit.

Provides access to GPU or other hardware accelerators in the system.

Repeated instantiations with a previously-used device-id will still refer to the first such instance that was created. This is especially important when providing a different memory limit: only the value (implicitly or explicitly) provided in the first such instantiation is effective.

from max import driver
device = driver.Accelerator()
# Or specify GPU id
device = driver.Accelerator(id=0)  # First GPU
device = driver.Accelerator(id=1)  # Second GPU
# Get device id
device_id = device.id
# Optionally specify memory limit
device = driver.Accelerator(id=0, device_memory_limit=256*MB)
device2 = driver.Accelerator(id=0, device_memory_limit=512*MB)
# ... device2 will use the memory limit of 256*MB

Parameters:

  • id (int, optional) – The device ID to use. Defaults to -1, which selects the first available accelerator.
  • device_memory_limit (int, optional) – The maximum amount of memory in bytes that can be allocated on the device. Defaults to 99% of free memory.

Returns:

A new Accelerator device object.

Return type:

Accelerator

Buffer

class max.driver.Buffer(self, dtype: max.dtype.DType, shape: collections.abc.Sequence[int], device: max.driver.Device | None = None, pinned: bool = False)

class max.driver.Buffer(self, dtype: max.dtype.DType, shape: collections.abc.Sequence[int], stream: max.driver.DeviceStream, pinned: bool = False)

class max.driver.Buffer(self, shape: ndarray[writable=False], device: max.driver.Device)

Device-resident buffer representation.

Allocates memory onto a given device with the provided shape and dtype. Buffers can be sliced to provide strided views of the underlying memory, but any buffers input into model execution must be contiguous.

Supports numpy-style slicing but does not currently support setting items across multiple indices.

from max import driver
from max.dtype import DType

# Create a buffer on CPU
cpu_buffer = driver.Buffer(shape=[2, 3], dtype=DType.float32)

# Create a buffer on GPU
gpu = driver.Accelerator()
gpu_buffer = driver.Buffer(shape=[2, 3], dtype=DType.float32, device=gpu)

Parameters:

  • dtype (DType) – Data type of buffer elements.
  • shape (Sequence[int]) – Tuple of positive, non-zero integers denoting the buffer shape.
  • device (Device, optional) – Device to allocate buffer onto. Defaults to the CPU.
  • pinned (bool, optional) – If True, memory is page-locked (pinned). Defaults to False.
  • stream (DeviceStream, optional) – Stream to associate the buffer with.

contiguous()

contiguous()

Creates a contiguous copy of the parent buffer.

Parameters:

self (Buffer)

Return type:

Buffer

copy()

copy(self, stream: max.driver.DeviceStream) → max.driver.Buffer

copy(self, device: max.driver.Device | None = None) → max.driver.Buffer

Overloaded function.

  1. copy(self, stream: max.driver.DeviceStream) -> max.driver.Buffer

    Creates a deep copy on the device associated with the stream.

    Args:
    stream (DeviceStream): The stream to associate the new buffer with.
    Returns:
    Buffer: A new buffer that is a copy of this buffer.
  2. copy(self, device: max.driver.Device | None = None) -> max.driver.Buffer

    Creates a deep copy on an optionally given device.

    If device is None (default), a copy is created on the same device.

    from max import driver
    from max.dtype import DType
    ​
    cpu_buffer = driver.Buffer(shape=[2, 3], dtype=DType.bfloat16, device=driver.CPU())
    cpu_copy = cpu_buffer.copy()
    ​
    # Copy to GPU
    gpu = driver.Accelerator()
    gpu_copy = cpu_buffer.copy(device=gpu)
    Args:
    device (Device, optional): The device to create the copy on.
    Defaults to None (same device).
    Returns:
    Buffer: A new buffer that is a copy of this buffer.

device

property device

Device on which tensor is resident.

dtype

property dtype

DType of constituent elements in tensor.

element_size

property element_size

Return the size of the element type in bytes.

from_dlpack()

from_dlpack(*, copy=None)

Create a buffer from an object implementing the dlpack protocol.

This usually does not result in a copy, and the producer of the object retains ownership of the underlying memory.

Parameters:

Return type:

Buffer

from_numpy()

from_numpy()

Creates a buffer from a provided numpy array on the host device.

The underlying data is not copied unless the array is noncontiguous. If it is, a contiguous copy will be returned.

Parameters:

arr (ndarray[tuple[Any, ...], dtype[Any]])

Return type:

Buffer

inplace_copy_from()

inplace_copy_from(src)

Copy the contents of another buffer into this one.

These buffers may be on different devices. Requires that both buffers are contiguous and have same size.

Parameters:

Return type:

None

is_contiguous

property is_contiguous

Whether or not buffer is contiguously allocated in memory. Returns false if the buffer is a non-contiguous slice.

Currently, we consider certain situations that are contiguous as non-contiguous for the purposes of our engine, such as when a buffer has negative steps.

is_host

property is_host

Whether or not buffer is host-resident. Returns false for GPU buffers, true for CPU buffers.

from max import driver
from max.dtype import DType

cpu_buffer = driver.Buffer(shape=[2, 3], dtype=DType.bfloat16, device=driver.CPU())

print(cpu_buffer.is_host)

item()

item(self) → Any

Returns the scalar value at a given location. Currently implemented only for zero-rank buffers. The return type is converted to a Python built-in type.

mmap()

mmap(dtype, shape, mode='copyonwrite', offset=0)

Parameters:

  • filename (PathLike[str] | str)
  • dtype (DType)
  • shape (ShapeType | int)
  • mode (np._MemMapModeKind)
  • offset (int)

Return type:

Buffer

num_elements

property num_elements

Returns the number of elements in this buffer.

Rank-0 buffers have 1 element by convention.

pinned

property pinned

Whether or not the underlying memory is pinned (page-locked).

rank

property rank

Buffer rank.

scalar

scalar = <nanobind.nb_func object>

shape

property shape

Shape of buffer.

stream

property stream

Stream to which tensor is bound.

to()

to(self, device: max.driver.Device) → max.driver.Buffer

to(self, stream: max.driver.DeviceStream) → max.driver.Buffer

to(self, devices: collections.abc.Sequence[max.driver.Device]) → list[max.driver.Buffer]

to(self, streams: collections.abc.Sequence[max.driver.DeviceStream]) → list[max.driver.Buffer]

Overloaded function.

  1. to(self, device: max.driver.Device) -> max.driver.Buffer

    Return a buffer that’s guaranteed to be on the given device.

    The buffer is only copied if the requested device is different from the device upon which the buffer is already resident.

  2. to(self, stream: max.driver.DeviceStream) -> max.driver.Buffer

    Return a buffer that’s guaranteed to be on the given device and associated with the given stream.

    The buffer is only copied if the requested device is different from the device upon which the buffer is already resident. If the destination stream is on the same device, then a new reference to the same buffer is returned.

  3. to(self, devices: collections.abc.Sequence[max.driver.Device]) -> list[max.driver.Buffer]

    Return a list of buffers that are guaranteed to be on the given devices.

    The buffers are only copied if the requested devices are different from the device upon which the buffer is already resident.

  4. to(self, streams: collections.abc.Sequence[max.driver.DeviceStream]) -> list[max.driver.Buffer]

    Return a list of buffers that are guaranteed to be on the given streams.

    The buffers are only copied if the requested streams are different from the stream upon which the buffer is already resident.

to_numpy()

to_numpy()

Converts the buffer to a numpy array.

If the buffer is not on the host, a copy will be issued.

Parameters:

self (Buffer)

Return type:

ndarray[tuple[Any, …], dtype[Any]]

view()

view(dtype, shape=None)

Return a new buffer with the given type and shape that shares the underlying memory.

If the shape is not given, it will be deduced if possible, or a ValueError is raised.

Parameters:

Return type:

Buffer

zeros

zeros = <nanobind.nb_func object>

CPU

class max.driver.CPU(self, id: int = -1)

Creates a CPU device.

from max import driver
# Create default CPU device
device = driver.CPU()
# Device id is always 0 for CPU devices
device_id = device.id

Parameters:

id (int, optional) – The device ID to use. Defaults to -1.

Returns:

A new CPU device object.

Return type:

CPU

DLPackArray

class max.driver.DLPackArray(*args, **kwargs)

Device

class max.driver.Device

api

property api

Returns the API used to program the device.

Possible values are:

  • cpu for host devices.
  • cuda for NVIDIA GPUs.
  • hip for AMD GPUs.
from max import driver

device = driver.CPU()
device.api

architecture_name

property architecture_name

Returns the architecture name of the device.

Examples of possible values:

  • gfx90a, gfx942 for AMD GPUs.
  • sm_80, sm_86 for NVIDIA GPUs.
  • CPU devices raise an exception.
from max import driver

device = driver.Accelerator()
device.architecture_name

can_access()

can_access(self, other: max.driver.Device) → bool

Checks if this device can directly access memory of another device.

from max import driver

gpu0 = driver.Accelerator(id=0)
gpu1 = driver.Accelerator(id=1)

if gpu0.can_access(gpu1):
    print("GPU0 can directly access GPU1 memory.")

Parameters:

other (Device) – The other device to check peer access against.

Returns:

True if peer access is possible, False otherwise.

Return type:

bool

cpu

cpu = <nanobind.nb_func object>

default_stream

property default_stream

Returns the default stream for this device.

The default stream is initialized when the device object is created.

Returns:

The default execution stream for this device.

Return type:

DeviceStream

id

property id

Returns a zero-based device id. For a CPU device this is always 0. For GPU accelerators this is the id of the device relative to this host. Along with the label, an id can uniquely identify a device, e.g. gpu:0, gpu:1.

from max import driver

device = driver.Accelerator()
device_id = device.id

Returns:

The device ID.

Return type:

int

is_compatible

property is_compatible

Returns whether this device is compatible with MAX.

Returns:

True if the device is compatible with MAX, False otherwise.

Return type:

bool

is_host

property is_host

Whether this device is the CPU (host) device.

from max import driver

device = driver.CPU()
device.is_host

label

property label

Returns device label.

Possible values are:

  • cpu for host devices.
  • gpu for accelerators.
from max import driver

device = driver.CPU()
device.label

stats

property stats

Returns utilization data for the device.

from max import driver

device = driver.CPU()
stats = device.stats

Returns:

A dictionary containing device utilization statistics.

Return type:

dict

synchronize()

synchronize(self) → None

Ensures all operations on this device complete before returning.

Raises:

ValueError – If any enqueued operations had an internal error.

DeviceSpec

class max.driver.DeviceSpec(id, device_type='cpu')

Specification for a device, containing its ID and type.

This class provides a way to specify device parameters like ID and type (CPU/GPU) for creating Device instances.

Parameters:

accelerator()

static accelerator(id=0)

Creates an accelerator (GPU) device specification.

Parameters:

id (int)

cpu()

static cpu(id=-1)

Creates a CPU device specification.

Parameters:

id (int)

device_type

device_type: Literal['cpu', 'gpu'] = 'cpu'

Type of specified device.

id

id: int

Provided id for this device.

DeviceStream

class max.driver.DeviceStream(self, device: max.driver.Device)

Provides access to a stream of execution on a device.

A stream represents a sequence of operations that will be executed in order. Multiple streams on the same device can execute concurrently.

from max import driver
# Create a default accelerator device
device = driver.Accelerator()
# Get the default stream for the device
stream = device.default_stream
# Create a new stream of execution on the device
new_stream = driver.DeviceStream(device)

Creates a new stream of execution associated with the device.

Parameters:

device (Device) – The device to create the stream on.

Returns:

A new stream of execution.

Return type:

DeviceStream

device

property device

The device this stream is executing on.

synchronize()

synchronize(self) → None

Ensures all operations on this stream complete before returning.

Raises:

ValueError – If any enqueued operations had an internal error.

wait_for()

wait_for(self, stream: max.driver.DeviceStream) → None

wait_for(self, device: max.driver.Device) → None

Overloaded function.

  1. wait_for(self, stream: max.driver.DeviceStream) -> None

    Ensures all operations on the other stream complete before future work submitted to this stream is scheduled.

    Args:
    stream (DeviceStream): The stream to wait for.
  2. wait_for(self, device: max.driver.Device) -> None

    Ensures all operations on device’s default stream complete before future work submitted to this stream is scheduled.

    Args:
    device (Device): The device whose default stream to wait for.

accelerator_api()

max.driver.accelerator_api()

Returns the API used to program the accelerator.

Return type:

str

accelerator_architecture_name()

max.driver.accelerator_architecture_name()

Returns the architecture name of the accelerator device.

Return type:

str

calculate_virtual_device_count()

max.driver.calculate_virtual_device_count(*device_spec_lists)

Calculate the minimum virtual device count needed for the given device specs.

Parameters:

*device_spec_lists (list[DeviceSpec]) – One or more lists of DeviceSpec objects (e.g., main devices and draft devices)

Returns:

The minimum number of virtual devices needed (max GPU ID + 1), or 1 if no GPUs

Return type:

int

calculate_virtual_device_count_from_cli()

max.driver.calculate_virtual_device_count_from_cli(*device_inputs)

Calculate virtual device count from raw CLI inputs (before parsing).

This helper works with the raw device input strings or lists before they’re parsed into DeviceSpec objects. Used when virtual device mode needs to be enabled before device validation occurs.

Parameters:

*device_inputs (str | list[int]) – One or more raw device inputs - either strings like “gpu:0,1,2” or lists of integers like [0, 1, 2]

Returns:

The minimum number of virtual devices needed (max GPU ID + 1), or 1 if no GPUs

Return type:

int

devices_exist()

max.driver.devices_exist(devices)

Identify if devices exist.

Parameters:

devices (list[DeviceSpec])

Return type:

bool

load_devices()

max.driver.load_devices(device_specs)

Initialize and return a list of devices, given a list of device specs.

Parameters:

device_specs (Sequence[DeviceSpec])

Return type:

list[Device]

load_max_buffer()

max.driver.load_max_buffer(path)

Experimental method for loading serialized MAX buffers.

Max buffers can be exported by creating a graph and calling Value.print() with the BINARY_MAX_CHECKPOINT option.

Parameters:

path (PathLike[str]) – Path to buffer (should end with .max)

Returns:

A Buffer created from the path. The shape and dtype are read from the file.

Raises:

ValueError if the file format is not the MAX checkpoint format.

Return type:

Buffer

scan_available_devices()

max.driver.scan_available_devices()

Returns all accelerators if available, else return cpu.

Return type:

list[DeviceSpec]

accelerator_count()

max.driver.accelerator_count() → int

Returns number of accelerator devices available.

Was this page helpful?