Mojo struct
DeviceContext
@register_passable
struct DeviceContext
Represents a single stream of execution on a particular accelerator (GPU). A DeviceContext
serves as the low-level interface to the accelerator inside a MAX custom operation and provides methods for allocating buffers on the device, copying data between host and device, and for compiling and running functions (also known as kernels) on the device.
The device context can be used as a context manager. For example:
from gpu.host import DeviceContext
from gpu import thread_idx
fn kernel():
print("hello from thread:", thread_idx.x, thread_idx.y, thread_idx.z)
with DeviceContext() as ctx:
ctx.enqueue_function[kernel](grid_dim=1, block_dim=(2, 2, 2))
ctx.synchronize()
from gpu.host import DeviceContext
from gpu import thread_idx
fn kernel():
print("hello from thread:", thread_idx.x, thread_idx.y, thread_idx.z)
with DeviceContext() as ctx:
ctx.enqueue_function[kernel](grid_dim=1, block_dim=(2, 2, 2))
ctx.synchronize()
A custom operation receives an opaque MojoCallContextPtr
, which provides
a get_device_context()
method to retrieve the device context:
from runtime.asyncrt import MojoCallContextPtr
@register("custom_op")
struct CustomOp:
@staticmethod
fn execute(ctx_ptr: MojoCallContextPtr) raises:
var ctx = ctx_ptr.get_device_context()
ctx.enqueue_function[kernel](grid_dim=1, block_dim=(2, 2, 2))
ctx.synchronize()
from runtime.asyncrt import MojoCallContextPtr
@register("custom_op")
struct CustomOp:
@staticmethod
fn execute(ctx_ptr: MojoCallContextPtr) raises:
var ctx = ctx_ptr.get_device_context()
ctx.enqueue_function[kernel](grid_dim=1, block_dim=(2, 2, 2))
ctx.synchronize()
Aliases
device_info = from_name[::StringLiteral]()
:gpu.info.Info
object for the default accelerator.device_api = from_name[::StringLiteral]().api
: Device API for the default accelerator (for example, "cuda" or "hip").
Implemented traits
AnyType
,
CollectionElement
,
Copyable
,
Movable
,
UnknownDestructibility
Methods
__init__
__init__(out self, device_id: Int = 0, *, api: String = String(from_name[::StringLiteral]()), buffer_cache_size: UInt = UInt(0))
Constructs a DeviceContext
for the specified device.
Args:
- device_id (
Int
): ID of the accelerator device. If not specified, uses the default accelerator. - api (
String
): Device API, for example, "cuda" for an NVIDIA GPU, or "gpu" for the currently available accelerator. - buffer_cache_size (
UInt
): Amount of space to pre-allocate for device buffers, in bytes.
__copyinit__
__copyinit__(existing: Self) -> Self
Copy the DeviceContext
.
__del__
__del__(owned self)
copy
copy(self) -> Self
Explicitly construct a copy of self.
Returns:
A copy of this value.
__enter__
__enter__(owned self) -> Self
name
name(self) -> String
Returns the device name, an ASCII string identifying this device, defined by the native device API.
api
api(self) -> String
Returns the name of the API used to program the device.
Possible values are:
- "cpu": Generic host device (CPU).
- "cuda": NVIDIA GPUs.
- "hip": AMD GPUs.
malloc_host
malloc_host[type: AnyType](self, size: Int) -> UnsafePointer[type]
Allocates a block of pinned memory on the host.
Pinned memory is guaranteed to remain resident in the host's RAM, not be
paged/swapped out to disk. Memory allocated normally (for example, using
UnsafePointer.alloc()
)
is pageable—individual pages of memory can be moved to secondary storage
(disk/SSD) when main memory fills up.
Using pinned memory allows devices to make fast transfers between host memory and device memory, because they can use direct memory access (DMA) to transfer data without relying on the CPU.
Allocating too much pinned memory can cause performance issues, since it reduces the amount of memory available for other processes.
Parameters:
- type (
AnyType
): The data type to be stored in the allocated memory.
Args:
- size (
Int
): The number of elements oftype
to allocate memory for.
Returns:
A pointer to the newly-allocated memory.
free_host
free_host[type: AnyType](self, ptr: UnsafePointer[type])
Frees a previously-allocated block of pinned memory.
Parameters:
- type (
AnyType
): The data type stored in the allocated memory.
Args:
- ptr (
UnsafePointer[type]
): Pointer to the data block to free.
enqueue_create_buffer
enqueue_create_buffer[type: DType](self, size: Int) -> DeviceBuffer[type]
Enqueues a buffer creation using the DeviceBuffer
constructor.
For GPU devices, the space is allocated in the device's global memory.
Parameters:
- type (
DType
): The data type to be stored in the allocated memory.
Args:
- size (
Int
): The number of elements oftype
to allocate memory for.
Returns:
The allocated buffer.
create_buffer_sync
create_buffer_sync[type: DType](self, size: Int) -> DeviceBuffer[type]
Creates a buffer synchronously using the DeviceBuffer
constructor.
Parameters:
- type (
DType
): The data type to be stored in the allocated memory.
Args:
- size (
Int
): The number of elements oftype
to allocate memory for.
Returns:
The allocated buffer.
enqueue_create_host_buffer
enqueue_create_host_buffer[type: DType](self, size: Int) -> DeviceBuffer[type]
Enqueues a the creation of a host memory DeviceBuffer.
compile_function
compile_function[func_type: AnyTrivialRegType, //, func: $0, *, dump_asm: Variant[Bool, Path, fn() capturing -> Path] = __init__[::CollectionElement](False), dump_llvm: Variant[Bool, Path, fn() capturing -> Path] = __init__[::CollectionElement](False)](self, *, func_attribute: OptionalReg[FuncAttribute] = OptionalReg[FuncAttribute]({:i1 0, 1}), out result: DeviceFunction[func, target=from_name[::StringLiteral]().target[::Int]()])
Compiles the provided function for execution on this device.
Parameters:
- func_type (
AnyTrivialRegType
): Type of the function. - func (
$0
): The function to compile. - dump_asm (
Variant[Bool, Path, fn() capturing -> Path]
): To dump the compiled assembly, passTrue
, or a file path to dump to, or a function returning a file path. - dump_llvm (
Variant[Bool, Path, fn() capturing -> Path]
): To dump the generated LLVM code, passTrue
, or a file path to dump to, or a function returning a file path.
Args:
- func_attribute (
OptionalReg[FuncAttribute]
): An attribute to use when compiling the code (such as maximum shared memory size).
Returns:
The compiled function.
enqueue_function
enqueue_function[func_type: AnyTrivialRegType, //, func: $0, *Ts: AnyType, *, dump_asm: Variant[Bool, Path, fn() capturing -> Path] = __init__[::CollectionElement](False), dump_llvm: Variant[Bool, Path, fn() capturing -> Path] = __init__[::CollectionElement](False)](self, *args: *Ts, *, grid_dim: Dim, block_dim: Dim, cluster_dim: OptionalReg[Dim] = OptionalReg[Dim]({:i1 0, 1}), shared_mem_bytes: OptionalReg[Int] = OptionalReg[Int]({:i1 0, 1}), owned attributes: List[LaunchAttribute] = List(), owned constant_memory: List[ConstantMemoryMapping] = List(), func_attribute: OptionalReg[FuncAttribute] = OptionalReg[FuncAttribute]({:i1 0, 1}))
Compiles and enqueues a kernel for execution on this device.
Parameters:
- func_type (
AnyTrivialRegType
): The type of the function to launch. - func (
$0
): The function to launch. - *Ts (
AnyType
): The types of the arguments being passed to the function. - dump_asm (
Variant[Bool, Path, fn() capturing -> Path]
): PassTrue
or aPath
to dump the assembly. - dump_llvm (
Variant[Bool, Path, fn() capturing -> Path]
): PassTrue
or aPath
to dump the LLVM IR.
enqueue_function[*Ts: AnyType](self, f: DeviceFunction[func, target=target, _ptxas_info_verbose=_ptxas_info_verbose], *args: *Ts, *, grid_dim: Dim, block_dim: Dim, cluster_dim: OptionalReg[Dim] = OptionalReg[Dim]({:i1 0, 1}), shared_mem_bytes: OptionalReg[Int] = OptionalReg[Int]({:i1 0, 1}), owned attributes: List[LaunchAttribute] = List(), owned constant_memory: List[ConstantMemoryMapping] = List())
Enqueues a compiled function for execution on this device.
Parameters:
- *Ts (
AnyType
): Argument types.
Args:
- f (
DeviceFunction[func, target=target, _ptxas_info_verbose=_ptxas_info_verbose]
): The compiled function to execute. - *args (
*Ts
): Arguments to pass to the function. - grid_dim (
Dim
): Dimensions of the compute grid, made up of thread blocks. - block_dim (
Dim
): Dimensions of each thread block in the grid. - cluster_dim (
OptionalReg[Dim]
): Dimensions of clusters (if the thread blocks are grouped into clusters). - shared_mem_bytes (
OptionalReg[Int]
): Amount of shared memory per thread block. - attributes (
List[LaunchAttribute]
): Launch attributes. - constant_memory (
List[ConstantMemoryMapping]
): Constant memory mapping.
execution_time
execution_time[: origin.set, //, func: fn(DeviceContext) raises capturing -> None](self, num_iters: Int) -> Int
execution_time_iter
execution_time_iter[: origin.set, //, func: fn(DeviceContext, Int) raises capturing -> None](self, num_iters: Int) -> Int
enqueue_copy_to_device
enqueue_copy_to_device[type: DType](self, dst_buf: DeviceBuffer[type], src_ptr: UnsafePointer[SIMD[type, 1]])
Enqueues an async copy from the host to the provided device buffer. The number of bytes copied is determined by the size of the device buffer.
Parameters:
- type (
DType
): Type of the data being copied.
Args:
- dst_buf (
DeviceBuffer[type]
): Device buffer to copy to. - src_ptr (
UnsafePointer[SIMD[type, 1]]
): Host pointer to copy from.
enqueue_copy_from_device
enqueue_copy_from_device[type: DType](self, dst_ptr: UnsafePointer[SIMD[type, 1]], src_buf: DeviceBuffer[type])
Enqueues an async copy from the device to the host. The number of bytes copied is determined by the size of the device buffer.
Parameters:
- type (
DType
): Type of the data being copied.
Args:
- dst_ptr (
UnsafePointer[SIMD[type, 1]]
): Host pointer to copy to. - src_buf (
DeviceBuffer[type]
): Device buffer to copy from.
enqueue_copy_from_device[type: DType](self, dst_ptr: UnsafePointer[SIMD[type, 1]], src_ptr: UnsafePointer[SIMD[type, 1]], size: Int)
Enqueues an async copy of size
elements from the device pointer to the host pointer.
Parameters:
- type (
DType
): Type of the data being copied.
Args:
- dst_ptr (
UnsafePointer[SIMD[type, 1]]
): Host pointer to copy to. - src_ptr (
UnsafePointer[SIMD[type, 1]]
): Device pointer to copy from. - size (
Int
): Number of elements (of the specifiedDType
) to copy.
enqueue_copy_device_to_device
enqueue_copy_device_to_device[type: DType](self, dst_buf: DeviceBuffer[type], src_buf: DeviceBuffer[type])
Enqueues an async copy from one device buffer to another. The amount of data transferred is determined by the size of the destination buffer.
Parameters:
- type (
DType
): Type of the data being copied.
Args:
- dst_buf (
DeviceBuffer[type]
): Device buffer to copy to. - src_buf (
DeviceBuffer[type]
): Device buffer to copy from. Must be at least as large asdst
.
enqueue_copy_device_to_device[type: DType](self, dst_ptr: UnsafePointer[SIMD[type, 1]], src_ptr: UnsafePointer[SIMD[type, 1]], size: Int)
Enqueues an async copy of size
elements from a device pointer to another device pointer.
Parameters:
- type (
DType
): Type of the data being copied.
Args:
- dst_ptr (
UnsafePointer[SIMD[type, 1]]
): Host pointer to copy to. - src_ptr (
UnsafePointer[SIMD[type, 1]]
): Device pointer to copy from. - size (
Int
): Number of elements (of the specifiedDType
) to copy.
copy_to_device_sync
copy_to_device_sync[type: DType](self, dst_buf: DeviceBuffer[type], src_ptr: UnsafePointer[SIMD[type, 1]])
Copies data from the host to the provided device buffer. The number of bytes copied is determined by the size of the device buffer.
Parameters:
- type (
DType
): Type of the data being copied.
Args:
- dst_buf (
DeviceBuffer[type]
): Device buffer to copy to. - src_ptr (
UnsafePointer[SIMD[type, 1]]
): Host pointer to copy from.
copy_from_device_sync
copy_from_device_sync[type: DType](self, dst_ptr: UnsafePointer[SIMD[type, 1]], src_buf: DeviceBuffer[type])
Copies data from the device to the host. The number of bytes copied is determined by the size of the device buffer.
Parameters:
- type (
DType
): Type of the data being copied.
Args:
- dst_ptr (
UnsafePointer[SIMD[type, 1]]
): Host pointer to copy to. - src_buf (
DeviceBuffer[type]
): Device buffer to copy from.
copy_device_to_device_sync
copy_device_to_device_sync[type: DType](self, dst_buf: DeviceBuffer[type], src_buf: DeviceBuffer[type])
Copies data from one device buffer to another. The amount of data transferred is determined by the size of the destination buffer.
Parameters:
- type (
DType
): Type of the data being copied.
Args:
- dst_buf (
DeviceBuffer[type]
): Device buffer to copy to. - src_buf (
DeviceBuffer[type]
): Device buffer to copy from. Must be at least as large asdst
.
enqueue_memset
enqueue_memset[type: DType](self, dst: DeviceBuffer[type], val: SIMD[type, 1])
Enqueues an async memset operation, setting all of the elements in the destination device buffer to the specified value.
Parameters:
- type (
DType
): Type of the data stored in the buffer.
Args:
- dst (
DeviceBuffer[type]
): Destination buffer. - val (
SIMD[type, 1]
): Value to set all elements ofdst
to.
memset_sync
memset_sync[type: DType](self, dst: DeviceBuffer[type], val: SIMD[type, 1])
Synchronously sets all of the elements in the destination device buffer to the specified value.
Parameters:
- type (
DType
): Type of the data stored in the buffer.
Args:
- dst (
DeviceBuffer[type]
): The destination buffer. - val (
SIMD[type, 1]
): Value to set all elements ofdst
to.
memset
memset[type: DType](self, dst: DeviceBuffer[type], val: SIMD[type, 1])
Enqueues an async memset operation, setting all of the elements in the destination device buffer to the specified value.
Parameters:
- type (
DType
): Type of the data stored in the buffer.
Args:
- dst (
DeviceBuffer[type]
): Destination buffer. - val (
SIMD[type, 1]
): Value to set all elements ofdst
to.
synchronize
synchronize(self)
Blocks until all asynchronous calls on the stream associated with this device context have completed.
This should never be necessary when writing a custom operation.
get_driver_version
get_driver_version(self) -> Int
Returns the driver version associated with this device.
get_attribute
get_attribute(self, attr: DeviceAttribute) -> Int
Returns the specified attribute for this device.
Args:
- attr (
DeviceAttribute
): The device attribute to query.
Returns:
The value for attr
on this device.
is_compatible
is_compatible(self)
Returns True if this device is compatible with MAX.
id
id(self) -> SIMD[int64, 1]
Returns the ID associated with this device.
get_memory_info
get_memory_info(self) -> Tuple[UInt, UInt]
Returns the free and total memory size for this device.
Returns:
A tuple of (free memory, total memory) in bytes.
can_access
can_access(self, peer: Self) -> Bool
Returns True if this device can access the identified peer device.
Args:
- peer (
Self
): The peer device.
enable_peer_access
enable_peer_access(self, peer: Self)
Enables access to the peer device.
Args:
- peer (
Self
): The peer device.
number_of_devices
static number_of_devices(*, api: String = String(from_name[::StringLiteral]())) -> Int
Returns the number of devices available that support the specified API.
Args:
- api (
String
): Requested device API (for example, "cuda" or "hip").
map_to_host
map_to_host[type: DType](self, buf: DeviceBuffer[type]) -> _HostMappedBuffer[type]
Allows for temporary access to the device buffer by the host from within a with
statement.
var in_dev = ctx.enqueue_create_buffer[DType.float32](length)
var out_dev = ctx.enqueue_create_buffer[DType.float32](length)
# Initialize the input and output with known values.
with ctx.map_to_host(in_dev) as in_host, ctx.map_to_host(out_dev) as out_host:
for i in range(length):
in_host[i] = i
out_host[i] = 255
var in_dev = ctx.enqueue_create_buffer[DType.float32](length)
var out_dev = ctx.enqueue_create_buffer[DType.float32](length)
# Initialize the input and output with known values.
with ctx.map_to_host(in_dev) as in_host, ctx.map_to_host(out_dev) as out_host:
for i in range(length):
in_host[i] = i
out_host[i] = 255
Values modified inside the with
statement are updated on the
device when the with
statement exits.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!