Mojo function
shuffle_idx
shuffle_idx[type: DType, simd_width: Int, //](val: SIMD[type, simd_width], offset: SIMD[uint32, 1]) -> SIMD[type, simd_width]
Copies a value from a source lane to other lanes in a warp.
Broadcasts a value from a source thread in a warp to all participating threads
without using shared memory. This is a convenience wrapper that uses the full
warp mask by default.
Broadcasts a value from a source thread in a warp to all participating threads
without using shared memory. This is a convenience wrapper that uses the full
warp mask by default.
Example:
```mojo
from gpu.warp import shuffle_idx
val = SIMD[DType.float32, 16](1.0)
# Broadcast value from lane 0 to all lanes
result = shuffle_idx(val, 0)
# Get value from lane 5
result = shuffle_idx(val, 5)
```
.
```mojo
from gpu.warp import shuffle_idx
val = SIMD[DType.float32, 16](1.0)
# Broadcast value from lane 0 to all lanes
result = shuffle_idx(val, 0)
# Get value from lane 5
result = shuffle_idx(val, 5)
```
.
Parameters:
- type (
DType
): The data type of the SIMD elements (e.g. float32, int32, half). - simd_width (
Int
): The number of elements in each SIMD vector.
Args:
- val (
SIMD[type, simd_width]
): The SIMD value to be broadcast from the source lane. - offset (
SIMD[uint32, 1]
): The source lane ID to copy the value from.
Returns:
A SIMD vector where all lanes contain the value from the source lane specified by offset.
shuffle_idx[type: DType, simd_width: Int, //](mask: UInt, val: SIMD[type, simd_width], offset: SIMD[uint32, 1]) -> SIMD[type, simd_width]
Copies a value from a source lane to other lanes in a warp with explicit mask control.
Broadcasts a value from a source thread in a warp to participating threads specified by
the mask. This provides fine-grained control over which threads participate in the shuffle
operation.
Broadcasts a value from a source thread in a warp to participating threads specified by
the mask. This provides fine-grained control over which threads participate in the shuffle
operation.
Example:
```mojo
from gpu.warp import shuffle_idx
# Only broadcast to first 16 lanes
var mask = 0xFFFF # 16 ones
var val = SIMD[DType.float32, 32](1.0)
var result = shuffle_idx(mask, val, 5)
```
.
```mojo
from gpu.warp import shuffle_idx
# Only broadcast to first 16 lanes
var mask = 0xFFFF # 16 ones
var val = SIMD[DType.float32, 32](1.0)
var result = shuffle_idx(mask, val, 5)
```
.
Parameters:
- type (
DType
): The data type of the SIMD elements (e.g. float32, int32, half). - simd_width (
Int
): The number of elements in each SIMD vector.
Args:
- mask (
UInt
): A bit mask specifying which lanes participate in the shuffle (1 bit per lane). - val (
SIMD[type, simd_width]
): The SIMD value to be broadcast from the source lane. - offset (
SIMD[uint32, 1]
): The source lane ID to copy the value from.
Returns:
A SIMD vector where participating lanes (set in mask) contain the value from the source lane specified by offset. Non-participating lanes retain their original values.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!