Mojo package
gpu
GPU programming primitives: thread blocks, async memory, barriers, and sync.
These low level constructs allow you to write code that runs on the GPU with traditional programming style--partitioning work across threads that are mapped onto 1-, 2-, or 3-dimensional blocks. The thread blocks can subsequently be grouped into a grid of thread blocks.
A kernel is a function that runs on the GPU in parallel across many threads.
Currently, the
DeviceContext struct
provides the interface for compiling and launching GPU kernels inside MAX
custom operations.
The gpu.host package includes APIs to manage
interaction between the host (that is, the CPU) and device (that is, the GPU
or accelerator).
The gpu package exports aliases you can use to access information about the
grid and the current thread, including block dimensions, block index in the grid,
and thread index. Import these directly from gpu:
from gpu import block_dim, block_idx, thread_idx, global_idxFor an example of launching a GPU kernel from a MAX custom operation, see the vector addition example in the MAX repo.
Packages
-
compute: GPU compute operations package - MMA and tensor core operations. -
host: Implements the gpu host package. -
memory: GPU memory operations package. -
primitives: GPU primitives package - warp, block, cluster, and grid-level operations. -
sync: GPU synchronization primitives package.
Modules
-
globals: This module provides GPU-specific global constants and configuration values. -
intrinsics: Provides low-level GPU intrinsic operations and memory access primitives. -
profiler: This module provides GPU profiling functionality.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!