Skip to main content

Mojo function

enqueue_function_from_cubin

enqueue_function_from_cubin[*Ts: DevicePassable](ctx: DeviceContext, data: List[Byte], module_name: String, function_name: String, *args: *Ts, *, grid_dim: Dim, block_dim: Dim, shared_mem_bytes: OptionalReg[Int] = None, var attributes: List[LaunchAttribute] = List[LaunchAttribute](, Tuple[]()))

Loads and enqueues a Nvidia GPU kernel from cubin.

This function provides a simple way to load pre-compiled GPU code and launch it in a single call. It handles loading the function, enqueueing it, and releasing the function handle.

Example:

from gpu.host import DeviceContext
from gpu.host.device_context import enqueue_function_from_cubin
from pathlib import Path

var cubin_data = Path("kernel.cubin").read_bytes()
with DeviceContext() as ctx:
    enqueue_function_from_cubin(
        ctx,
        cubin_data,
        "my_module",
        "my_kernel",
        arg1,
        arg2,
        grid_dim=(num_blocks, 1, 1),
        block_dim=(256, 1, 1),
    )
    ctx.synchronize()

Parameters:

  • *Ts (DevicePassable): The types of the arguments, must implement DevicePassable.

Args:

  • ctx (DeviceContext): The device context to use.
  • data (List): The binary data (e.g., cubin file contents) containing the compiled kernel.
  • module_name (String): The name of the module.
  • function_name (String): The name of the kernel function entry point.
  • *args (*Ts): The arguments to pass to the kernel.
  • grid_dim (Dim): Grid dimensions for the kernel launch.
  • block_dim (Dim): Block dimensions for the kernel launch.
  • shared_mem_bytes (OptionalReg): Amount of dynamic shared memory in bytes.
  • attributes (List): Optional list of launch attributes.

Raises:

If loading or launching the function fails.

Was this page helpful?