Mojo struct

LaunchAttributeID

@register_passable(trivial) struct LaunchAttributeID

Identifies the type of launch attribute for GPU kernel execution.

This struct represents the various types of launch attributes that can be specified when launching GPU kernels or configuring streams and graph nodes. Each attribute controls different aspects of kernel execution behavior such as memory access policies, synchronization, scheduling, and resource allocation.

The attributes are compatible with CUDA's launch attribute system and provide fine-grained control over kernel execution characteristics.

Implemented traits

AnyType, Copyable, Movable, UnknownDestructibility, Writable

Aliases

`ACCESS_POLICY_WINDOW`

alias ACCESS_POLICY_WINDOW = LaunchAttributeID(1)

Valid for streams, graph nodes, launches.

`CLUSTER_DIMENSION`

alias CLUSTER_DIMENSION = LaunchAttributeID(4)

Valid for graph nodes, launches.

`CLUSTER_SCHEDULING_POLICY_PREFERENCE`

alias CLUSTER_SCHEDULING_POLICY_PREFERENCE = LaunchAttributeID(5)

Valid for graph nodes, launches.

`COOPERATIVE`

alias COOPERATIVE = LaunchAttributeID(2)

Valid for graph nodes, launches.

`DEVICE_UPDATABLE_KERNEL_NODE`

alias DEVICE_UPDATABLE_KERNEL_NODE = LaunchAttributeID(13)

Valid for graph nodes, launches. This attribute is graphs-only, and passing it to a launch in a non-capturing stream will result in an error. CUlaunchAttributeValue::deviceUpdatableKernelNode::deviceUpdatable can only be set to 0 or 1. Setting the field to 1 indicates that the corresponding kernel node should be device-updatable. On success, a handle will be returned via CUlaunchAttributeValue::deviceUpdatableKernelNode::devNode which can be passed to the various device-side update functions to update the node's kernel parameters from within another kernel. For more information on the types of device updates that can be made, as well as the relevant limitations thereof, see cudaGraphKernelNodeUpdatesApply. Nodes which are device-updatable have additional restrictions compared to regular kernel nodes. Firstly, device-updatable nodes cannot be removed from their graph via cuGraphDestroyNode. Additionally, once opted-in to this functionality, a node cannot opt out, and any attempt to set the deviceUpdatable attribute to 0 will result in an error. Device-updatable kernel nodes also cannot have their attributes copied to/from another kernel node via cuGraphKernelNodeCopyAttributes. Graphs containing one or more device-updatable nodes also do not allow multiple instantiation, and neither the graph nor its instantiated version can be passed to cuGraphExecUpdate. If a graph contains device-updatable nodes and updates those nodes from the device from within the graph, the graph must be uploaded with cuGraphUpload before it is launched. For such a graph, if host-side executable graph updates are made to the device-updatable nodes, the graph must be uploaded before it is launched again.

`IGNORE`

alias IGNORE = LaunchAttributeID(0)

Ignored entry, for convenient composition.

`LAUNCH_COMPLETION_EVENT`

alias LAUNCH_COMPLETION_EVENT = LaunchAttributeID(12)

Valid for launches. Set CUlaunchAttributeValue::launchCompletionEvent to record the event. Nominally, the event is triggered once all blocks of the kernel have begun execution. Currently this is a best effort. If a kernel B has a launch completion dependency on a kernel A, B may wait until A is complete. Alternatively, blocks of B may begin before all blocks of A have begun, for example if B can claim execution resources unavailable to A (e.g. they run on different GPUs) or if B is a higher priority than A. Exercise caution if such an ordering inversion could lead to deadlock. A launch completion event is nominally similar to a programmatic event with triggerAtBlockStart set except that it is not visible to cudaGridDependencySynchronize() and can be used with compute capability less than 9.0. The event supplied must not be an interprocess or interop event. The event must disable timing (i.e. must be created with the CU_EVENT_DISABLE_TIMING flag set).

`MEM_SYNC_DOMAIN`

alias MEM_SYNC_DOMAIN = LaunchAttributeID(10)

Valid for streams, graph nodes, launches.

`MEM_SYNC_DOMAIN_MAP`

alias MEM_SYNC_DOMAIN_MAP = LaunchAttributeID(9)

Valid for streams, graph nodes, launches.

`PREFERRED_SHARED_MEMORY_CARVEOUT`

alias PREFERRED_SHARED_MEMORY_CARVEOUT = LaunchAttributeID(14)

Valid for launches. On devices where the L1 cache and shared memory use the same hardware resources, setting CUlaunchAttributeValue::sharedMemCarveout to a percentage between 0-100 signals the CUDA driver to set the shared memory carveout preference, in percent of the total shared memory for that kernel launch. This attribute takes precedence over CU_FUNC_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT. This is only a hint, and the CUDA driver can choose a different configuration if required for the launch.

`PRIORITY`

alias PRIORITY = LaunchAttributeID(8)

Valid for streams, graph nodes, launches.

`PROGRAMMATIC_EVENT`

alias PROGRAMMATIC_EVENT = LaunchAttributeID(7)

Valid for launches. Set CUlaunchAttributeValue::programmaticEvent to record the event. Event recorded through this launch attribute is guaranteed to only trigger after all block in the associated kernel trigger the event. A block can trigger the event through PTX launchdep.release or CUDA builtin function cudaTriggerProgrammaticLaunchCompletion(). A trigger can also be inserted at the beginning of each block's execution if triggerAtBlockStart is set to non-0. The dependent launches can choose to wait on the dependency using the programmatic sync (cudaGridDependencySynchronize() or equivalent PTX instructions). Note that dependents (including the CPU thread calling cuEventSynchronize()) are not guaranteed to observe the release precisely when it is released. For example, cuEventSynchronize() may only observe the event trigger long after the associated kernel has completed. This recording type is primarily meant for establishing programmatic dependency between device tasks. Note also this type of dependency allows, but does not guarantee, concurrent execution of tasks. The event supplied must not be an interprocess or interop event. The event must disable timing (i.e. must be created with the CU_EVENT_DISABLE_TIMING flag set).

`PROGRAMMATIC_STREAM_SERIALIZATION`

alias PROGRAMMATIC_STREAM_SERIALIZATION = LaunchAttributeID(6)

Valid for launches. Setting CUlaunchAttributeValue:: programmaticStreamSerializationAllowed to non-0 signals that the kernel will use programmatic means to resolve its stream dependency, so that the CUDA runtime should opportunistically allow the grid's execution to overlap with the previous kernel in the stream, if that kernel requests the overlap. The dependent launches can choose to wait on the dependency using the programmatic sync.

`SYNCHRONIZATION_POLICY`

alias SYNCHRONIZATION_POLICY = LaunchAttributeID(3)

Valid for streams.

Methods

`init`

__init__(*, other: Self) -> Self

Explicitly construct a deep copy of the provided value.

Args:

other (Self): The value to copy.

`eq`

__eq__(self, other: Self) -> Bool

Checks if two LaunchAttribute instances are equal.

Compares the underlying integer values of the attributes.

Args:

other (Self): The other LaunchAttribute instance to compare with.

Returns:

True if the attributes are equal, False otherwise.

`ne`

__ne__(self, other: Self) -> Bool

Checks if two LaunchAttribute instances are not equal.

Args:

other (Self): The other LaunchAttribute instance to compare with.

Returns:

True if the attributes are not equal, False otherwise.

`is`

__is__(self, other: Self) -> Bool

Checks if two LaunchAttribute instances have the same value.

This is an identity comparison that delegates to equality comparison.

Args:

other (Self): The other `LaunchAttribute instance to compare with.

Returns:

True if the attributes have the same value, False otherwise.

`isnot`

__isnot__(self, other: Self) -> Bool

Checks if two LaunchAttribute instances have different values.

Args:

other (Self): The other LaunchAttribute instance to compare with.

Returns:

True if the attributes have different values, False otherwise.

`str`

__str__(self) -> String

Returns a string representation of the LaunchAttribute.

Returns:

A string representation of the attribute.

`write_to`

write_to[W: Writer](self, mut writer: W)

Writes the string representation of the attribute to a writer.

Parameters:

W (Writer): The type of writer to use for output. Must implement the Writer interface.

Args:

writer (W): The writer to write to.

Implemented traits​

Aliases​

ACCESS_POLICY_WINDOW​

CLUSTER_DIMENSION​

CLUSTER_SCHEDULING_POLICY_PREFERENCE​

COOPERATIVE​

DEVICE_UPDATABLE_KERNEL_NODE​

IGNORE​

LAUNCH_COMPLETION_EVENT​

MEM_SYNC_DOMAIN​

MEM_SYNC_DOMAIN_MAP​

PREFERRED_SHARED_MEMORY_CARVEOUT​

PRIORITY​

PROGRAMMATIC_EVENT​

PROGRAMMATIC_STREAM_SERIALIZATION​

SYNCHRONIZATION_POLICY​

Methods​

__init__​

__eq__​

__ne__​

__is__​

__isnot__​

__str__​

write_to​

Implemented traits

Aliases

`ACCESS_POLICY_WINDOW`

`CLUSTER_DIMENSION`

`CLUSTER_SCHEDULING_POLICY_PREFERENCE`

`COOPERATIVE`

`DEVICE_UPDATABLE_KERNEL_NODE`

`IGNORE`

`LAUNCH_COMPLETION_EVENT`

`MEM_SYNC_DOMAIN`

`MEM_SYNC_DOMAIN_MAP`

`PREFERRED_SHARED_MEMORY_CARVEOUT`

`PRIORITY`

`PROGRAMMATIC_EVENT`

`PROGRAMMATIC_STREAM_SERIALIZATION`

`SYNCHRONIZATION_POLICY`

Methods

`init`

`eq`

`ne`

`is`

`isnot`

`str`

`write_to`