Skip to main content

Mojo package

gpu

Provides low-level programming constructs for working with GPUs.

These low level constructs allow you to write code that runs on the GPU with traditional programming style--partitioning work across threads that are mapped onto 1-, 2-, or 3-dimensional blocks. The thread blocks can subsequently be grouped into a grid of thread blocks.

A kernel is a function that runs on the GPU in parallel across many threads. Currently, the DeviceContext struct provides the interface for compiling and launching GPU kernels inside MAX custom operations.

The gpu.host package includes APIs to manage interaction between the host (that is, the CPU) and device (that is, the GPU or accelerator).

The gpu package exports aliases you can use to access information about the grid and the current thread, including block dimensions, block index in the grid, and thread index. Import these directly from gpu:

from gpu import block_dim, block_idx, thread_idx, global_idx

Note: The gpu.id module is deprecated but still supported for backward compatibility. New code should import these symbols directly from the gpu package as shown above.

For an example of launching a GPU kernel from a MAX custom operation, see the vector addition example in the MAX repo.

Packages

  • compute: GPU compute operations package - MMA and tensor core operations.
  • host: Implements the gpu host package.
  • memory: GPU memory operations package.
  • primitives: GPU primitives package - warp, block, cluster, and grid-level operations.
  • sync: GPU synchronization primitives package.

Modules

  • block: Compatibility wrapper for gpu.block module.
  • cluster: GPU cluster operations (deprecated - use gpu.primitives.cluster or gpu).
  • globals: This module provides GPU-specific global constants and configuration values.
  • grid_controls: GPU grid dependency control (deprecated - use gpu.primitives.grid_controls or gpu).
  • id: GPU thread and block indexing (deprecated - use gpu package directly).
  • intrinsics: Provides low-level GPU intrinsic operations and memory access primitives.
  • mma: Matrix multiply-accumulate operations (deprecated - use gpu.compute.mma).
  • mma_operand_descriptor: MMA operand descriptor trait (deprecated - use gpu.compute.mma_operand_descriptor).
  • mma_sm100: SM100 (Blackwell) matrix multiply operations (deprecated - use gpu.compute.arch.mma_nvidia_sm100).
  • mma_util: Matrix multiply utilities (deprecated - use gpu.compute.mma_util).
  • profiler: This module provides GPU profiling functionality.
  • random: Random number generation for GPU kernels.
  • semaphore: GPU semaphore operations (deprecated - use gpu.sync.semaphore).
  • tcgen05: Tensor core generation 05 operations (deprecated - use gpu.compute.tcgen05).
  • warp: GPU warp-level operations (deprecated - use gpu.primitives.warp module).

Was this page helpful?