Skip to main content
Log in

Mojo function

shuffle_idx

shuffle_idx[type: DType, simd_width: Int, //](val: SIMD[type, simd_width], offset: SIMD[uint32, 1]) -> SIMD[type, simd_width]

Copies a value from a source lane to other lanes in a warp.

Broadcasts a value from a source thread in a warp to all participating threads
without using shared memory. This is a convenience wrapper that uses the full
warp mask by default.
Broadcasts a value from a source thread in a warp to all participating threads
without using shared memory. This is a convenience wrapper that uses the full
warp mask by default.

Example:

```mojo
from gpu.warp import shuffle_idx

val = SIMD[DType.float32, 16](1.0)

# Broadcast value from lane 0 to all lanes
result = shuffle_idx(val, 0)

# Get value from lane 5
result = shuffle_idx(val, 5)
```
.
```mojo
from gpu.warp import shuffle_idx

val = SIMD[DType.float32, 16](1.0)

# Broadcast value from lane 0 to all lanes
result = shuffle_idx(val, 0)

# Get value from lane 5
result = shuffle_idx(val, 5)
```
.

Parameters:

  • type (DType): The data type of the SIMD elements (e.g. float32, int32, half).
  • simd_width (Int): The number of elements in each SIMD vector.

Args:

  • val (SIMD[type, simd_width]): The SIMD value to be broadcast from the source lane.
  • offset (SIMD[uint32, 1]): The source lane ID to copy the value from.

Returns:

A SIMD vector where all lanes contain the value from the source lane specified by offset.

shuffle_idx[type: DType, simd_width: Int, //](mask: UInt, val: SIMD[type, simd_width], offset: SIMD[uint32, 1]) -> SIMD[type, simd_width]

Copies a value from a source lane to other lanes in a warp with explicit mask control.

Broadcasts a value from a source thread in a warp to participating threads specified by
the mask. This provides fine-grained control over which threads participate in the shuffle
operation.
Broadcasts a value from a source thread in a warp to participating threads specified by
the mask. This provides fine-grained control over which threads participate in the shuffle
operation.

Example:

```mojo
from gpu.warp import shuffle_idx

# Only broadcast to first 16 lanes
var mask = 0xFFFF # 16 ones
var val = SIMD[DType.float32, 32](1.0)
var result = shuffle_idx(mask, val, 5)
```
.
```mojo
from gpu.warp import shuffle_idx

# Only broadcast to first 16 lanes
var mask = 0xFFFF # 16 ones
var val = SIMD[DType.float32, 32](1.0)
var result = shuffle_idx(mask, val, 5)
```
.

Parameters:

  • type (DType): The data type of the SIMD elements (e.g. float32, int32, half).
  • simd_width (Int): The number of elements in each SIMD vector.

Args:

  • mask (UInt): A bit mask specifying which lanes participate in the shuffle (1 bit per lane).
  • val (SIMD[type, simd_width]): The SIMD value to be broadcast from the source lane.
  • offset (SIMD[uint32, 1]): The source lane ID to copy the value from.

Returns:

A SIMD vector where participating lanes (set in mask) contain the value from the source lane specified by offset. Non-participating lanes retain their original values.