Mojo package
gpu
Provides low-level programming constructs for working with GPUs.
These low level constructs allow you to write code that runs on the GPU with traditional programming style--partitioning work across threads that are mapped onto 1-, 2-, or 3-dimensional blocks. The thread blocks can subsequently be grouped into a grid of thread blocks.
A kernel is a function that runs on the GPU in parallel across many threads.
Currently, the
DeviceContext struct
provides the interface for compiling and launching GPU kernels inside MAX
custom operations.
The gpu.host package includes APIs to manage
interaction between the host (that is, the CPU) and device (that is, the GPU
or accelerator).
The gpu package exports aliases you can use to access information about the
grid and the current thread, including block dimensions, block index in the grid,
and thread index. Import these directly from gpu:
from gpu import block_dim, block_idx, thread_idx, global_idxNote: The gpu.id module is deprecated but still supported
for backward compatibility. New code should import these symbols directly from the
gpu package as shown above.
For an example of launching a GPU kernel from a MAX custom operation, see the vector addition example in the MAX repo.
Packages
- compute: GPU compute operations package - MMA and tensor core operations.
- host: Implements the gpu host package.
- memory: GPU memory operations package.
- primitives: GPU primitives package - warp, block, cluster, and grid-level operations.
- sync: GPU synchronization primitives package.
Modules
- block: Compatibility wrapper for gpu.block module.
- cluster: GPU cluster operations (deprecated - usegpu.primitives.clusterorgpu).
- globals: This module provides GPU-specific global constants and configuration values.
- grid_controls: GPU grid dependency control (deprecated - usegpu.primitives.grid_controlsorgpu).
- id: GPU thread and block indexing (deprecated - usegpupackage directly).
- intrinsics: Provides low-level GPU intrinsic operations and memory access primitives.
- mma: Matrix multiply-accumulate operations (deprecated - usegpu.compute.mma).
- mma_operand_descriptor: MMA operand descriptor trait (deprecated - usegpu.compute.mma_operand_descriptor).
- mma_sm100: SM100 (Blackwell) matrix multiply operations (deprecated - usegpu.compute.arch.mma_nvidia_sm100).
- mma_util: Matrix multiply utilities (deprecated - usegpu.compute.mma_util).
- profiler: This module provides GPU profiling functionality.
- random: Random number generation for GPU kernels.
- semaphore: GPU semaphore operations (deprecated - usegpu.sync.semaphore).
- tcgen05: Tensor core generation 05 operations (deprecated - usegpu.compute.tcgen05).
- warp: GPU warp-level operations (deprecated - usegpu.primitives.warpmodule).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!
