Mojo function

cp_async_bulk_wait_group

cp_async_bulk_wait_group[n: SIMD[int32, 1], read: Bool = True]()

Waits for completion of asynchronous bulk memory transfer groups.

This function causes the executing thread to wait until a specified number of the most recent bulk async-groups are pending. It provides synchronization control for bulk memory transfers on NVIDIA GPUs.

Note: This functionality is only available on NVIDIA GPUs. Attempting to use this function on non-NVIDIA GPUs will result in a compile time error.

Example:

from gpu.sync import cp_async_bulk_wait_group

# Wait until at most 2 async groups are pending
cp_async_bulk_wait_group[2]()

# Wait for all async groups to complete
cp_async_bulk_wait_group[0]()
from gpu.sync import cp_async_bulk_wait_group

# Wait until at most 2 async groups are pending
cp_async_bulk_wait_group[2]()

# Wait for all async groups to complete
cp_async_bulk_wait_group[0]()

Parameters:

n (SIMD): The number of most recent bulk async-groups allowed to remain pending. When n=0, waits for all prior bulk async-groups to complete.
read (Bool): If True, indicates that subsequent reads to the transferred memory are expected, enabling optimizations for read access patterns. Defaults to True.