Skip to main content
Log in

Mojo function

shuffle_up

shuffle_up[type: DType, simd_width: Int, //](val: SIMD[type, simd_width], offset: SIMD[uint32, 1]) -> SIMD[type, simd_width]

Copies values from threads with lower lane IDs in the warp.

Performs a shuffle operation where each thread receives a value from a thread with a lower lane ID, offset by the specified amount. Uses the full warp mask by default.

For example, with offset=1:

  • Thread N gets value from thread N-1
  • Thread 1 gets value from thread 0
  • Thread 0 gets undefined value

Parameters:

  • type (DType): The data type of the SIMD elements (e.g. float32, int32).
  • simd_width (Int): The number of elements in each SIMD vector.

Args:

  • val (SIMD[type, simd_width]): The SIMD value to be shuffled up the warp.
  • offset (SIMD[uint32, 1]): The number of lanes to shift values up by.

Returns:

The SIMD value from the thread offset lanes lower in the warp. Returns undefined values for threads where lane_id - offset < 0.

shuffle_up[type: DType, simd_width: Int, //](mask: UInt, val: SIMD[type, simd_width], offset: SIMD[uint32, 1]) -> SIMD[type, simd_width]

Copies values from threads with lower lane IDs in the warp.

Performs a shuffle operation where each thread receives a value from a thread with a lower lane ID, offset by the specified amount. The operation is performed only for threads specified in the mask.

For example, with offset=1:

  • Thread N gets value from thread N-1 if both threads are in the mask
  • Thread 1 gets value from thread 0 if both threads are in the mask
  • Thread 0 gets undefined value
  • Threads not in the mask get undefined values

Parameters:

  • type (DType): The data type of the SIMD elements (e.g. float32, int32).
  • simd_width (Int): The number of elements in each SIMD vector.

Args:

  • mask (UInt): The warp mask specifying which threads participate in the shuffle.
  • val (SIMD[type, simd_width]): The SIMD value to be shuffled up the warp.
  • offset (SIMD[uint32, 1]): The number of lanes to shift values up by.

Returns:

The SIMD value from the thread offset lanes lower in the warp. Returns undefined values for threads where lane_id - offset < 0 or threads not in the mask.