Skip to main content
Log in

Mojo package

gpu

The GPU package provides low-level programming constructs for working with GPUs. These low level constructs allow you to write code that runs on the GPU with traditional programming style--partitioning work across threads that are mapped onto 1-, 2-, or 3-dimensional blocks. The thread blocks can subsequently be grouped into a grid of thread blocks.

A kernel is a function that runs on the GPU in parallel across many threads. Currently, the DeviceContext struct provides the interface for compiling and launching GPU kernels inside MAX custom operations.

The gpu.host package includes APIs to manage interaction between the host (that is, the CPU) and device (that is, the GPU or accelerator).

See the gpu.id module for a list of aliases you can use to access information about the grid and the current thread, including block dimensions, block index in the grid and thread index.

The sync module provides functions for synchronizing threads.

For an example of launching a GPU kernel from a MAX custom operation, see the vector addition example in the MAX repo.

Packages

  • host: Implements the gpu host package.

Modules

  • all_reduce: Multi-GPU allreduce implementation for efficient tensor reduction across GPUs.
  • cluster: This module provides low-level NVIDIA GPU cluster synchronization primitives for SM90+ architectures.
  • globals: This module provides GPU-specific global constants and configuration values.
  • grid_controls: This module provides low-level Grid Dependent Control primitives for NVIDIA GPUs. These instructions are used for control execution of dependent grids.
  • id: This module provides GPU thread and block indexing functionality.
  • intrinsics: This module provides low-level GPU intrinsic operations and memory access primitives.
  • memory: This module provides GPU memory operations and utilities.
  • mma: This module includes utilities for working with the warp-matrix-matrix-multiplication (wmma) instructions.
  • mma_util: Matrix multiply accumulate (MMA) utilities for GPU tensor cores.
  • profiler: This module provides GPU profiling functionality.
  • random: Random number generation for GPU kernels.
  • semaphore: This module provides a device-wide semaphore implementation for NVIDIA GPUs.
  • sync: This module provides GPU synchronization primitives and barriers.
  • tensor_ops: This module provides tensor core operations and utilities for GPU computation.
  • warp: GPU warp-level operations and utilities.