Skip to main content
Log in

Mojo function

shuffle_down

shuffle_down[type: DType, simd_width: Int, //](val: SIMD[type, simd_width], offset: SIMD[uint32, 1]) -> SIMD[type, simd_width]

Copies values from threads with higher lane IDs in the warp.

Performs a shuffle operation where each thread receives a value from a thread with a higher lane ID, offset by the specified amount. Uses the full warp mask by default.

For example, with offset=1:

  • Thread 0 gets value from thread 1
  • Thread 1 gets value from thread 2
  • Thread N gets value from thread N+1
  • Last N threads get undefined values

Parameters:

  • type (DType): The data type of the SIMD elements (e.g. float32, int32).
  • simd_width (Int): The number of elements in each SIMD vector.

Args:

  • val (SIMD[type, simd_width]): The SIMD value to be shuffled down the warp.
  • offset (SIMD[uint32, 1]): The number of lanes to shift values down by. Must be positive.

Returns:

The SIMD value from the thread offset lanes higher in the warp. Returns undefined values for threads where lane_id + offset >= WARP_SIZE.

shuffle_down[type: DType, simd_width: Int, //](mask: UInt, val: SIMD[type, simd_width], offset: SIMD[uint32, 1]) -> SIMD[type, simd_width]

Copies values from threads with higher lane IDs in the warp using a custom mask.

Performs a shuffle operation where each thread receives a value from a thread with a higher lane ID, offset by the specified amount. The mask parameter controls which threads participate in the shuffle.

For example, with offset=1:

  • Thread 0 gets value from thread 1
  • Thread 1 gets value from thread 2
  • Thread N gets value from thread N+1
  • Last N threads get undefined values

Parameters:

  • type (DType): The data type of the SIMD elements (e.g. float32, int32).
  • simd_width (Int): The number of elements in each SIMD vector.

Args:

  • mask (UInt): A bitmask controlling which threads participate in the shuffle. Only threads with their corresponding bit set will exchange values.
  • val (SIMD[type, simd_width]): The SIMD value to be shuffled down the warp.
  • offset (SIMD[uint32, 1]): The number of lanes to shift values down by. Must be positive.

Returns:

The SIMD value from the thread offset lanes higher in the warp. Returns undefined values for threads where lane_id + offset >= WARP_SIZE or where the corresponding mask bit is not set.