Mojo package

gpu

Provides low-level programming constructs for working with GPUs.

These low level constructs allow you to write code that runs on the GPU with traditional programming style--partitioning work across threads that are mapped onto 1-, 2-, or 3-dimensional blocks. The thread blocks can subsequently be grouped into a grid of thread blocks.

A kernel is a function that runs on the GPU in parallel across many threads. Currently, the DeviceContext struct provides the interface for compiling and launching GPU kernels inside MAX custom operations.

The gpu.host package includes APIs to manage interaction between the host (that is, the CPU) and device (that is, the GPU or accelerator).

See the gpu.id module for a list of aliases you can use to access information about the grid and the current thread, including block dimensions, block index in the grid and thread index.

The sync module provides functions for synchronizing threads.

For an example of launching a GPU kernel from a MAX custom operation, see the vector addition example in the MAX repo.

Packages

comm: The gpu.comm package provides communication primitives for GPUs.
host: Implements the gpu host package.

Modules

block: GPU block-level operations and utilities.
cluster: This module provides low-level NVIDIA GPU cluster synchronization primitives for SM90+ architectures.
globals: This module provides GPU-specific global constants and configuration values.
grid_controls: Grid Dependent Control primitives for NVIDIA Hopper (SM90+) GPUs.
id: This module provides GPU thread and block indexing functionality.
intrinsics: Provides low-level GPU intrinsic operations and memory access primitives.
memory: This module provides GPU memory operations and utilities.
mma: This module includes utilities for working with the warp-matrix-matrix-multiplication (wmma) instructions.
mma_operand_descriptor:
mma_sm100: This module includes utilities for working with the SM100 MMA instructions.
mma_util: Matrix multiply accumulate (MMA) utilities for GPU tensor cores.
profiler: This module provides GPU profiling functionality.
random: Random number generation for GPU kernels.
semaphore: This module provides a device-wide semaphore implementation for NVIDIA GPUs.
sync: This module provides GPU synchronization primitives and barriers.
tcgen05: This module includes utilities for working with the tensorcore 5th generation (tcgen05) instructions.
tensor_ops: This module provides tensor core operations and utilities for GPU computation.
warp: GPU warp-level operations and utilities.

Packages​

Modules​

Packages

Modules