Mojo module

globals

This module provides GPU-specific global constants and configuration values.

The module defines hardware-specific constants like warp size and thread block limits that are used throughout the GPU programming interface. It handles both NVIDIA and AMD GPU architectures, automatically detecting and configuring the appropriate values based on the available hardware.

The constants are resolved at compile time based on the target GPU architecture and are used to optimize code generation and ensure hardware compatibility.

Aliases

`MAX_THREADS_PER_BLOCK_METADATA`

alias MAX_THREADS_PER_BLOCK_METADATA = _resolve_max_threads_per_block_metadata()

This is metadata tag that is used in conjunction with __llvm_metadata to give a hint to the compiler about the max threads per block that's used.

`WARP_SIZE`

alias WARP_SIZE = _resolve_warp_size()

The number of threads that execute in lockstep within a warp on the GPU.

This constant represents the hardware warp size, which is the number of threads that execute instructions synchronously as a unit. The value is architecture-dependent:

32 threads per warp on NVIDIA GPUs
32 threads per warp on AMD RDNA GPUs
64 threads per warp on AMD CDNA GPUs
0 if no GPU is detected

The warp size is a fundamental parameter that affects:

Thread scheduling and execution
Memory access coalescing
Synchronization primitives
Overall performance optimization

`WARPGROUP_SIZE`

alias WARPGROUP_SIZE = _resolve_warpgroup_size()

The number of threads in a warpgroup on Nvidia GPUs.

On Nvidia GPUs after hopper, a warpgroup consists of 4 subsequent arps i.e. 128 threads. The first warp id must be multiple of 4.

Warpgroup is used for wgmma instructions on Hopper and tcgen05.ld on Blackwell.

Aliases​

MAX_THREADS_PER_BLOCK_METADATA​

WARP_SIZE​

WARPGROUP_SIZE​

Aliases

`MAX_THREADS_PER_BLOCK_METADATA`

`WARP_SIZE`

`WARPGROUP_SIZE`