IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

CompletionFlag

CompletionFlag

class max.driver.CompletionFlag(self, device: max.driver.Device)

source

Bases: object

An 8-byte completion flag in pinned host memory mapped into a device’s address space.

Lets a CPU thread signal a GPU stream (or vice versa) by writing a 64-bit value to a single location that’s visible to both. Pair with DeviceStream.wait_for_host_value (added in a follow-on PR) or the mo.wait_host_value graph op to gate downstream GPU work on a host-produced result without a second stream or a blocking host callback.

Currently requires a CUDA-backed Device; constructing against any other backend raises RuntimeError.

from max.driver import Accelerator, CompletionFlag

accel = Accelerator()
flag = CompletionFlag(accel)
assert flag.load() == 0  # initialized to zero

# actually use the flag's device pointer.

Allocates a fresh device-mapped pinned u64 bound to device.

Parameters:

device – A CUDA-backed device. Other backends raise RuntimeError.

device_ptr

property device_ptr

source

Device-visible 64-bit address of the 8-byte slot.

Suitable for passing to graph ops or stream APIs that wait on a memory value.

load()

load(self) → int

source

Acquire-ordered load of the current flag value.

Pairs with a release-ordered store on the producer side.

Returns:

Current 64-bit flag value.

Return type:

int

reset()

reset(self) → None

source

Clears the flag back to 0 with a relaxed atomic store.

Safe to call before any consumer has observed the address.

signal()

signal(self, value: int) → None

source

Release-ordered store of value to the flag.

Pairs with the GPU-side cuStreamWaitValue64 (or a host-side acquire load).

Primary intended use is priming the flag at setup time so the first captured-graph replay’s mo.wait_host_value passes immediately, before any async kickoff has run. Direct Python signalling on the hot path is usually a mistake – prefer the async-host-func trampoline which signals from its AsyncRT worker.

Parameters:

value – The 64-bit value to store.