Skip to main content
Log in

Mojo module

warp

GPU warp-level operations and utilities.

This module provides warp-level operations for NVIDIA and AMD GPUs, including:

  • Shuffle operations to exchange values between threads in a warp:

    • shuffle_idx: Copy value from source lane to other lanes
    • shuffle_up: Copy from lower lane IDs
    • shuffle_down: Copy from higher lane IDs
    • shuffle_xor: Exchange values in butterfly pattern
  • Warp-wide reductions:

    • sum: Compute sum across warp
    • max: Find maximum value across warp
    • min: Find minimum value across warp
    • broadcast: Broadcast value to all lanes

The module handles both NVIDIA and AMD GPU architectures through architecture-specific implementations of the core operations. It supports various data types including integers, floats, and half-precision floats, with SIMD vectorization.

Structs

Functions

  • broadcast: Broadcasts a SIMD value from lane 0 to all lanes in the warp.
  • lane_group_max: Reduces a SIMD value to its maximum within a lane group using warp-level operations.
  • lane_group_max_and_broadcast: Reduces and broadcasts the maximum value within a lane group using warp-level operations.
  • lane_group_min: Reduces a SIMD value to its minimum within a lane group using warp-level operations.
  • lane_group_reduce: Performs a generic warp-level reduction operation using shuffle operations.
  • lane_group_sum: Computes the sum of values across a group of lanes using warp-level operations.
  • lane_group_sum_and_broadcast: Computes the sum across a lane group and broadcasts the result to all lanes.
  • max: Computes the maximum value across all lanes in a warp.
  • min: Computes the minimum value across all lanes in a warp.
  • reduce: Performs a generic warp-wide reduction operation using shuffle operations.
  • shuffle_down: Copies values from threads with higher lane IDs in the warp.
  • shuffle_idx: Copies a value from a source lane to other lanes in a warp.
  • shuffle_up: Copies values from threads with lower lane IDs in the warp.
  • shuffle_xor: Exchanges values between threads in a warp using a butterfly pattern.
  • sum: Computes the sum of values across all lanes in a warp.