Mojo function
shuffle_down
shuffle_down[type: DType, simd_width: Int, //](val: SIMD[type, simd_width], offset: SIMD[uint32, 1]) -> SIMD[type, simd_width]
Copies values from threads with higher lane IDs in the warp.
Performs a shuffle operation where each thread receives a value from a thread with a higher lane ID, offset by the specified amount. Uses the full warp mask by default.
For example, with offset=1:
- Thread 0 gets value from thread 1
- Thread 1 gets value from thread 2
- Thread N gets value from thread N+1
- Last N threads get undefined values
Parameters:
- type (
DType
): The data type of the SIMD elements (e.g. float32, int32). - simd_width (
Int
): The number of elements in each SIMD vector.
Args:
- val (
SIMD[type, simd_width]
): The SIMD value to be shuffled down the warp. - offset (
SIMD[uint32, 1]
): The number of lanes to shift values down by. Must be positive.
Returns:
The SIMD value from the thread offset lanes higher in the warp. Returns undefined values for threads where lane_id + offset >= WARP_SIZE.
shuffle_down[type: DType, simd_width: Int, //](mask: UInt, val: SIMD[type, simd_width], offset: SIMD[uint32, 1]) -> SIMD[type, simd_width]
Copies values from threads with higher lane IDs in the warp using a custom mask.
Performs a shuffle operation where each thread receives a value from a thread with a higher lane ID, offset by the specified amount. The mask parameter controls which threads participate in the shuffle.
For example, with offset=1:
- Thread 0 gets value from thread 1
- Thread 1 gets value from thread 2
- Thread N gets value from thread N+1
- Last N threads get undefined values
Parameters:
- type (
DType
): The data type of the SIMD elements (e.g. float32, int32). - simd_width (
Int
): The number of elements in each SIMD vector.
Args:
- mask (
UInt
): A bitmask controlling which threads participate in the shuffle. Only threads with their corresponding bit set will exchange values. - val (
SIMD[type, simd_width]
): The SIMD value to be shuffled down the warp. - offset (
SIMD[uint32, 1]
): The number of lanes to shift values down by. Must be positive.
Returns:
The SIMD value from the thread offset lanes higher in the warp. Returns undefined values for threads where lane_id + offset >= WARP_SIZE or where the corresponding mask bit is not set.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!