Mojo function
shuffle_up
shuffle_up[type: DType, simd_width: Int, //](val: SIMD[type, simd_width], offset: SIMD[uint32, 1]) -> SIMD[type, simd_width]
Copies values from threads with lower lane IDs in the warp.
Performs a shuffle operation where each thread receives a value from a thread with a lower lane ID, offset by the specified amount. Uses the full warp mask by default.
For example, with offset=1:
- Thread N gets value from thread N-1
- Thread 1 gets value from thread 0
- Thread 0 gets undefined value
Parameters:
- type (
DType
): The data type of the SIMD elements (e.g. float32, int32). - simd_width (
Int
): The number of elements in each SIMD vector.
Args:
- val (
SIMD[type, simd_width]
): The SIMD value to be shuffled up the warp. - offset (
SIMD[uint32, 1]
): The number of lanes to shift values up by.
Returns:
The SIMD value from the thread offset lanes lower in the warp. Returns undefined values for threads where lane_id - offset < 0.
shuffle_up[type: DType, simd_width: Int, //](mask: UInt, val: SIMD[type, simd_width], offset: SIMD[uint32, 1]) -> SIMD[type, simd_width]
Copies values from threads with lower lane IDs in the warp.
Performs a shuffle operation where each thread receives a value from a thread with a lower lane ID, offset by the specified amount. The operation is performed only for threads specified in the mask.
For example, with offset=1:
- Thread N gets value from thread N-1 if both threads are in the mask
- Thread 1 gets value from thread 0 if both threads are in the mask
- Thread 0 gets undefined value
- Threads not in the mask get undefined values
Parameters:
- type (
DType
): The data type of the SIMD elements (e.g. float32, int32). - simd_width (
Int
): The number of elements in each SIMD vector.
Args:
- mask (
UInt
): The warp mask specifying which threads participate in the shuffle. - val (
SIMD[type, simd_width]
): The SIMD value to be shuffled up the warp. - offset (
SIMD[uint32, 1]
): The number of lanes to shift values up by.
Returns:
The SIMD value from the thread offset lanes lower in the warp. Returns undefined values for threads where lane_id - offset < 0 or threads not in the mask.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!