Mojo module
warp
GPU warp-level operations and utilities.
This module provides warp-level operations for NVIDIA and AMD GPUs, including:
-
Shuffle operations to exchange values between threads in a warp:
- shuffle_idx: Copy value from source lane to other lanes
- shuffle_up: Copy from lower lane IDs
- shuffle_down: Copy from higher lane IDs
- shuffle_xor: Exchange values in butterfly pattern
-
Warp-wide reductions:
- sum: Compute sum across warp
- max: Find maximum value across warp
- min: Find minimum value across warp
- broadcast: Broadcast value to all lanes
The module handles both NVIDIA and AMD GPU architectures through architecture-specific implementations of the core operations. It supports various data types including integers, floats, and half-precision floats, with SIMD vectorization.
Structs
-
ReductionMethod
: Enumerates the supported reduction methods.
Functions
-
broadcast
: Broadcasts a SIMD value from lane 0 to all lanes in the warp. -
lane_group_max
: Reduces a SIMD value to its maximum within a lane group using warp-level operations. -
lane_group_max_and_broadcast
: Reduces and broadcasts the maximum value within a lane group using warp-level operations. -
lane_group_min
: Reduces a SIMD value to its minimum within a lane group using warp-level operations. -
lane_group_reduce
: Performs a generic warp-level reduction operation using shuffle operations. -
lane_group_sum
: Computes the sum of values across a group of lanes using warp-level operations. -
lane_group_sum_and_broadcast
: Computes the sum across a lane group and broadcasts the result to all lanes. -
max
: Computes the maximum value across all lanes in a warp. -
min
: Computes the minimum value across all lanes in a warp. -
reduce
: Performs a generic warp-wide reduction operation using shuffle operations. -
shuffle_down
: Copies values from threads with higher lane IDs in the warp. -
shuffle_idx
: Copies a value from a source lane to other lanes in a warp. -
shuffle_up
: Copies values from threads with lower lane IDs in the warp. -
shuffle_xor
: Exchanges values between threads in a warp using a butterfly pattern. -
sum
: Computes the sum of values across all lanes in a warp.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!