Skip to main content

Mojo package

gpu

GPU programming primitives: thread blocks, async memory, barriers, and sync.

These low level constructs allow you to write code that runs on the GPU with traditional programming style--partitioning work across threads that are mapped onto 1-, 2-, or 3-dimensional blocks. The thread blocks can subsequently be grouped into a grid of thread blocks.

A kernel is a function that runs on the GPU in parallel across many threads. Currently, the DeviceContext struct provides the interface for compiling and launching GPU kernels inside MAX custom operations.

The gpu.host package includes APIs to manage interaction between the host (that is, the CPU) and device (that is, the GPU or accelerator).

The gpu package exports aliases you can use to access information about the grid and the current thread, including block dimensions, block index in the grid, and thread index. Import these directly from gpu:

from gpu import block_dim, block_idx, thread_idx, global_idx

For an example of launching a GPU kernel from a MAX custom operation, see the vector addition example in the MAX repo.

Packages

  • compute: GPU compute operations package - MMA and tensor core operations.
  • host: Implements the gpu host package.
  • memory: GPU memory operations package.
  • primitives: GPU primitives package - warp, block, cluster, and grid-level operations.
  • sync: GPU synchronization primitives package.

Modules

  • globals: This module provides GPU-specific global constants and configuration values.
  • intrinsics: Provides low-level GPU intrinsic operations and memory access primitives.
  • profiler: This module provides GPU profiling functionality.

Was this page helpful?