For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

CompletionFlag

`CompletionFlag`

class max.driver.CompletionFlag(self, device: max.driver.Device)

source

Bases: object

An 8-byte completion flag in pinned host memory mapped into a device’s address space.

Lets a CPU thread signal a GPU stream (or vice versa) by writing a 64-bit value to a single location that’s visible to both. Pair with DeviceStream.wait_for_host_value (added in a follow-on PR) or the mo.wait_host_value graph op to gate downstream GPU work on a host-produced result without a second stream or a blocking host callback.

Currently requires a CUDA-backed Device; constructing against any other backend raises RuntimeError.

from max.driver import Accelerator, CompletionFlag

accel = Accelerator()
flag = CompletionFlag(accel)
assert flag.load() == 0  # initialized to zero

# actually use the flag's device pointer.

Allocates a fresh device-mapped pinned u64 bound to device.

Parameters:: device – A CUDA-backed device. Other backends raise RuntimeError.

`device_ptr`

property device_ptr

source

Device-visible 64-bit address of the 8-byte slot.

Suitable for passing to graph ops or stream APIs that wait on a memory value.

`load()`

load(self) → int

source

Acquire-ordered load of the current flag value.

Pairs with a release-ordered store on the producer side.

Returns:: Current 64-bit flag value.
Return type:: int

`reset()`

reset(self) → None

source

Clears the flag back to 0 with a relaxed atomic store.

Safe to call before any consumer has observed the address.

`signal()`

signal(self, value: int) → None

source

Release-ordered store of value to the flag.

Pairs with the GPU-side cuStreamWaitValue64 (or a host-side acquire load).

Primary intended use is priming the flag at setup time so the first captured-graph replay’s mo.wait_host_value passes immediately, before any async kickoff has run. Direct Python signalling on the hot path is usually a mistake – prefer the async-host-func trampoline which signals from its AsyncRT worker.

Parameters:: value – The 64-bit value to store.

CompletionFlag​

device_ptr​

load()​

reset()​

signal()​

`CompletionFlag`

`device_ptr`

`load()`

`reset()`

`signal()`