Intrinsics

Module

Defines intrinsics.

PrefetchCache

Prefetch cache type.

Aliases:

  • DATA = PrefetchCache(1) The data prefetching option.
  • INSTRUCTION = PrefetchCache(0) The instruction prefetching option.

Fields:

value

The cache prefetch. It should be in [0, 1].

Functions:

__init__

__init__(value: Int) -> Self

Construct a prefetch option.

Args:

  • value (Int): An integer value representing the prefetch cache option to be used. Should be a value in the range [0, 1].

Returns:

The prefetch cache type that was constructed.

PrefetchLocality

The prefetch locality.

The locality, rw, and cache type correspond to LLVM prefetch intrinsic’s inputs (see LLVM prefetch locality)

Aliases:

  • HIGH = PrefetchLocality(3) Extremely local locality (keep in cache).
  • LOW = PrefetchLocality(1) Low locality.
  • MEDIUM = PrefetchLocality(2) Medium locality.
  • NONE = PrefetchLocality(0) No locality.

Fields:

value

The prefetch locality to use. It should be a value in [0, 3].

Functions:

__init__

__init__(value: Int) -> Self

Construct a prefetch locality option.

Args:

  • value (Int): An integer value representing the locality. Should be a value in the range [0, 3].

Returns:

The prefetch locality constructed.

PrefetchOptions

Collection of configuration parameters for a prefetch intrinsic call.

The op configuration follows similar interface as LLVM intrinstic prefetch op, with a “locality” attribute that specifies the level of temporal locality in the application, that is, how soon would the same data be visited again. Possible locality values are: NONE, LOW, MEDIUM, and HIGH.

The op also takes a “cache tag” attribute giving hints on how the prefetched data will be used. Possible tags are: ReadICache, ReadDCache and WriteDCache.

Note: the actual behavior of the prefetch op and concrete interpretation of these attributes are target-dependent.

Fields:

cache

Indicates i-cache or d-cache prefetching.

locality

Indicates locality level.

rw

Indicates prefetching for read or write.

Functions:

__init__

__init__() -> Self

Constructs an instance of PrefetchOptions with default params.

Returns:

The Prefetch configuration constructed.

for_read

for_read(self: Self) -> Self

Sets the prefetch purpose to read.

Returns:

The updated prefetch parameter.

for_write

for_write(self: Self) -> Self

Sets the prefetch purpose to write.

Returns:

The updated prefetch parameter.

high_locality

high_locality(self: Self) -> Self

Sets the prefetch locality to high.

Returns:

The updated prefetch parameter.

low_locality

low_locality(self: Self) -> Self

Sets the prefetch locality to low.

Returns:

The updated prefetch parameter.

medium_locality

medium_locality(self: Self) -> Self

Sets the prefetch locality to medium.

Returns:

The updated prefetch parameter.

no_locality

no_locality(self: Self) -> Self

Sets the prefetch locality to none.

Returns:

The updated prefetch parameter.

to_data_cache

to_data_cache(self: Self) -> Self

Sets the prefetch target to data cache.

Returns:

The updated prefetch parameter.

to_instruction_cache

to_instruction_cache(self: Self) -> Self

Sets the prefetch target to instruction cache.

Returns:

The updated prefetch parameter.

PrefetchRW

Prefetch read or write.

Aliases:

  • READ = PrefetchRW(0) Read prefetch.
  • WRITE = PrefetchRW(1) Write prefetch.

Fields:

value

The read-write prefetch. It should be in [0, 1].

Functions:

__init__

__init__(value: Int) -> Self

Construct a prefetch read-write option.

Args:

  • value (Int): An integer value representing the prefetch read-write option to be used. Should be a value in the range [0, 1].

Returns:

The prefetch read-write option constructed.

compressed_store

compressed_store[size: Int, type: DType](value: SIMD[type, size], addr: DTypePointer[type], mask: SIMD[bool, size])

Compress the lanes of value, skipping masked lanes, and store at addr.

Parameters:

  • size (Int): Size of value, the value to store.
  • type (DType): DType of value, the value to store.

Args:

  • value (SIMD[type, size]): The vector containing data to store.
  • addr (DTypePointer[type]): The memory location to store the compressed data.
  • mask (SIMD[bool, size]): A binary vector which prevents memory access to certain lanes of value.

external_call

external_call[callee: StringLiteral, type: AnyType]() -> type

Call an external function.

Parameters:

  • callee (StringLiteral): The name of the external function.
  • type (AnyType): The return type.

Returns:

The external call result.

external_call[callee: StringLiteral, type: AnyType, T0: AnyType](arg0: T0) -> type

Call an external function.

Parameters:

  • callee (StringLiteral): The name of the external function.
  • type (AnyType): The return type.
  • T0 (AnyType): The first argument type.

Args:

  • arg0 (T0): The first argument.

Returns:

The external call result.

external_call[callee: StringLiteral, type: AnyType, T0: AnyType, T1: AnyType](arg0: T0, arg1: T1) -> type

Call an external function.

Parameters:

  • callee (StringLiteral): The name of the external function.
  • type (AnyType): The return type.
  • T0 (AnyType): The first argument type.
  • T1 (AnyType): The second argument type.

Args:

  • arg0 (T0): The first argument.
  • arg1 (T1): The second argument.

Returns:

The external call result.

external_call[callee: StringLiteral, type: AnyType, T0: AnyType, T1: AnyType, T2: AnyType](arg0: T0, arg1: T1, arg2: T2) -> type

Call an external function.

Parameters:

  • callee (StringLiteral): The name of the external function.
  • type (AnyType): The return type.
  • T0 (AnyType): The first argument type.
  • T1 (AnyType): The second argument type.
  • T2 (AnyType): The third argument type.

Args:

  • arg0 (T0): The first argument.
  • arg1 (T1): The second argument.
  • arg2 (T2): The third argument.

Returns:

The external call result.

external_call[callee: StringLiteral, type: AnyType, T0: AnyType, T1: AnyType, T2: AnyType, T3: AnyType](arg0: T0, arg1: T1, arg2: T2, arg3: T3) -> type

Call an external function.

Parameters:

  • callee (StringLiteral): The name of the external function.
  • type (AnyType): The return type.
  • T0 (AnyType): The first argument type.
  • T1 (AnyType): The second argument type.
  • T2 (AnyType): The third argument type.
  • T3 (AnyType): The fourth argument type.

Args:

  • arg0 (T0): The first argument.
  • arg1 (T1): The second argument.
  • arg2 (T2): The third argument.
  • arg3 (T3): The fourth argument.

Returns:

The external call result.

external_call[callee: StringLiteral, type: AnyType, T0: AnyType, T1: AnyType, T2: AnyType, T3: AnyType, T4: AnyType](arg0: T0, arg1: T1, arg2: T2, arg3: T3, arg4: T4) -> type

Call an external function.

Parameters:

  • callee (StringLiteral): The name of the external function.
  • type (AnyType): The return type.
  • T0 (AnyType): The first argument type.
  • T1 (AnyType): The second argument type.
  • T2 (AnyType): The third argument type.
  • T3 (AnyType): The fourth argument type.
  • T4 (AnyType): The fifth argument type.

Args:

  • arg0 (T0): The first argument.
  • arg1 (T1): The second argument.
  • arg2 (T2): The third argument.
  • arg3 (T3): The fourth argument.
  • arg4 (T4): The fifth argument.

Returns:

The external call result.

gather

gather[size: Int, type: DType](base: SIMD[address, size], mask: SIMD[bool, size], passthrough: SIMD[type, size], alignment: Int) -> SIMD[type, size]

Read scalar values from a SIMD vector, and gather them into one vector.

The gather function reads scalar values from a SIMD vector of memory locations and gathers them into one vector. The memory locations are provided in the vector of pointers base as addresses. The memory is accessed according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes of the passthrough operand.

In general, for some vector of pointers base, mask mask, and passthrough pass a call of the form:

gather(base, mask, pass)

is equivalent to the following sequence of scalar loads in C++:

for (int i = 0; i < N; i++)
  result[i] = mask[i] ? *base[i] : passthrough[i];

Parameters:

  • size (Int): Size of the return SIMD buffer.
  • type (DType): DType of the return SIMD buffer.

Args:

  • base (SIMD[address, size]): The vector containing memory addresses that gather will access.
  • mask (SIMD[bool, size]): A binary vector which prevents memory access to certain lanes of the base vector.
  • passthrough (SIMD[type, size]): In the result vector, the masked-off lanes are replaced with the passthrough vector.
  • alignment (Int): The alignment of the source addresses. Must be 0 or a power of two constant integer value.

Returns:

A SIMD[type, size] containing the result of the gather operation.

llvm_intrinsic

llvm_intrinsic[intrin: StringLiteral, type: AnyType]() -> type

Call an LLVM intrinsic with no arguments.

Call an LLVM intrinsic with the name intrin and return type type.

Parameters:

  • intrin (StringLiteral): The name of the llvm intrinsic.
  • type (AnyType): The return type of the intrinsic.

Returns:

The result of calling the llvm intrinsic with no arguments.

llvm_intrinsic[intrin: StringLiteral, type: AnyType, T0: AnyType](arg0: T0) -> type

Call an LLVM intrinsic with one argument.

Call the intrinsic with the name intrin and return type type on argument arg0.

Parameters:

  • intrin (StringLiteral): The name of the llvm intrinsic.
  • type (AnyType): The return type of the intrinsic.
  • T0 (AnyType): The type of the first argument to the intrinsic (arg0).

Args:

  • arg0 (T0): The argument to call the LLVM intrinsic with. The type of arg0 must be T0.

Returns:

The result of calling the llvm intrinsic with arg0 as an argument.

llvm_intrinsic[intrin: StringLiteral, type: AnyType, T0: AnyType, T1: AnyType](arg0: T0, arg1: T1) -> type

Call an LLVM intrinsic with two arguments.

Call the LLVM intrinsic with the name intrin and return type type on arguments arg0 and arg1.

Parameters:

  • intrin (StringLiteral): The name of the llvm intrinsic.
  • type (AnyType): The return type of the intrinsic.
  • T0 (AnyType): The type of the first argument to the intrinsic (arg0).
  • T1 (AnyType): The type of the second argument to the intrinsic (arg1).

Args:

  • arg0 (T0): The first argument to call the LLVM intrinsic with. The type of arg0 must be T0.
  • arg1 (T1): The second argument to call the LLVM intrinsic with. The type of arg1 must be T1.

Returns:

The result of calling the llvm intrinsic with arg0 and arg1 as arguments.

llvm_intrinsic[intrin: StringLiteral, type: AnyType, T0: AnyType, T1: AnyType, T2: AnyType](arg0: T0, arg1: T1, arg2: T2) -> type

Call an LLVM intrinsic with three arguments.

Call the LLVM intrinsic with the name intrin and return type type on arguments arg0, arg1 and arg2.

Parameters:

  • intrin (StringLiteral): The name of the llvm intrinsic.
  • type (AnyType): The return type of the intrinsic.
  • T0 (AnyType): The type of the first argument to the intrinsic (arg0).
  • T1 (AnyType): The type of the second argument to the intrinsic (arg1).
  • T2 (AnyType): The type of the third argument to the intrinsic (arg2).

Args:

  • arg0 (T0): The first argument to call the LLVM intrinsic with. The type of arg0 must be T0.
  • arg1 (T1): The second argument to call the LLVM intrinsic with. The type of arg1 must be T1.
  • arg2 (T2): The third argument to call the LLVM intrinsic with. The type of arg2 must be T2.

Returns:

The result of calling the llvm intrinsic with arg0, arg1 and arg2 as arguments.

llvm_intrinsic[intrin: StringLiteral, type: AnyType, T0: AnyType, T1: AnyType, T2: AnyType, T3: AnyType](arg0: T0, arg1: T1, arg2: T2, arg3: T3) -> type

Call an LLVM intrinsic with four arguments.

Call the LLVM intrinsic with the name intrin and return type type on arguments arg0, arg1, arg2 and arg3.

Parameters:

  • intrin (StringLiteral): The name of the llvm intrinsic.
  • type (AnyType): The return type of the intrinsic.
  • T0 (AnyType): The type of the first argument to the intrinsic (arg0).
  • T1 (AnyType): The type of the second argument to the intrinsic (arg1).
  • T2 (AnyType): The type of the third argument to the intrinsic (arg2).
  • T3 (AnyType): The type of the fourth argument to the intrinsic (arg3).

Args:

  • arg0 (T0): The first argument to call the LLVM intrinsic with. The type of arg0 must be T0.
  • arg1 (T1): The second argument to call the LLVM intrinsic with. The type of arg1 must be T1.
  • arg2 (T2): The third argument to call the LLVM intrinsic with. The type of arg2 must be T2.
  • arg3 (T3): The fourth argument to call the LLVM intrinsic with. The type of arg3 must be T3.

Returns:

The result of calling the llvm intrinsic with arg0, arg1, arg2 and arg3 as arguments.

llvm_intrinsic[intrin: StringLiteral, type: AnyType, T0: AnyType, T1: AnyType, T2: AnyType, T3: AnyType, T4: AnyType](arg0: T0, arg1: T1, arg2: T2, arg3: T3, arg4: T4) -> type

Call an LLVM intrinsic with five arguments.

Call the LLVM intrinsic with the name intrin and return type type on arguments arg0, arg1, arg2, arg3 and arg4.

Parameters:

  • intrin (StringLiteral): The name of the llvm intrinsic.
  • type (AnyType): The return type of the intrinsic.
  • T0 (AnyType): The type of the first argument to the intrinsic (arg0).
  • T1 (AnyType): The type of the second argument to the intrinsic (arg1).
  • T2 (AnyType): The type of the third argument to the intrinsic (arg2).
  • T3 (AnyType): The type of the fourth argument to the intrinsic (arg3).
  • T4 (AnyType): The type of the fifth argument to the intrinsic (arg4).

Args:

  • arg0 (T0): The first argument to call the LLVM intrinsic with. The type of arg0 must be T0.
  • arg1 (T1): The second argument to call the LLVM intrinsic with. The type of arg1 must be T1.
  • arg2 (T2): The third argument to call the LLVM intrinsic with. The type of arg2 must be T2.
  • arg3 (T3): The fourth argument to call the LLVM intrinsic with. The type of arg3 must be T3.
  • arg4 (T4): The fourth argument to call the LLVM intrinsic with. The type of arg4 must be T4.

Returns:

The result of calling the llvm intrinsic with arg0, arg1, arg2, arg3 and arg4 as arguments.

masked_load

masked_load[size: Int, type: DType](addr: DTypePointer[type], mask: SIMD[bool, size], passthrough: SIMD[type, size], alignment: Int) -> SIMD[type, size]

Load data from memory and return it, replacing masked lanes with values from the passthrough vector.

Parameters:

  • size (Int): Size of the return SIMD buffer.
  • type (DType): DType of the return SIMD buffer.

Args:

  • addr (DTypePointer[type]): The base pointer for the load.
  • mask (SIMD[bool, size]): A binary vector which prevents memory access to certain lanes of the memory stored at addr.
  • passthrough (SIMD[type, size]): In the result vector, the masked-off lanes are replaced with the passthrough vector.
  • alignment (Int): The alignment of the source addresses. Must be 0 or a power of two constant integer value. Default is 1.

Returns:

The loaded memory stored in a vetor of type SIMD[type, size].

masked_store

masked_store[size: Int, type: DType](value: SIMD[type, size], addr: DTypePointer[type], mask: SIMD[bool, size], alignment: Int)

Store a value at a memory location, skipping masked lanes.

Parameters:

  • size (Int): Size of value, the data to store.
  • type (DType): DType of value, the data to store.

Args:

  • value (SIMD[type, size]): The vector containing data to store.
  • addr (DTypePointer[type]): A vector of memory location to store data at.
  • mask (SIMD[bool, size]): A binary vector which prevents memory access to certain lanes of value.
  • alignment (Int): The alignment of the destination locations. Must be 0 or a power of two constant integer value.

prefetch

prefetch[type: DType, params: PrefetchOptions](addr: DTypePointer[type])

Prefetch an instruction or data into cache before it is used.

The prefetch function provides prefetching hints for the target to prefetch instruction or data into cache before they are used.

Parameters:

  • type (DType): The DType of value stored in addr.
  • params (PrefetchOptions): Configuration options for the prefect intrinsic.

Args:

  • addr (DTypePointer[type]): The data pointer to prefetch.

scatter

scatter[size: Int, type: DType](value: SIMD[type, size], base: SIMD[address, size], mask: SIMD[bool, size], alignment: Int)

Scatter takes scalar values from a SIMD vector and scatters them into a vector of pointers.

The scatter operation stores scalar values from a SIMD vector of memory locations and scatters them into a vector of pointers. The memory locations are provided in the vector of pointers base as addresses. The memory is stored according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes.

The value operand is a vector value to be written to memory. The base operand is a vector of pointers, pointing to where the value elements should be stored. It has the same underlying type as the value operand. The mask operand, mask, is a vector of boolean values. The types of the mask and the value operand must have the same number of vector elements.

The behavior of the _scatter is undefined if the op stores into the same memory location more than once.

In general, for some vector %value, vector of pointers %base, and mask %mask instructions of the form:

%0 = pop.simd.scatter %value, %base[%mask] : !pop.simd<N, type>

is equivalent to the following sequence of scalar loads in C++:

for (int i = 0; i < N; i++)
  if (mask[i])
    base[i] = value[i];

Parameters:

  • size (Int): Size of value, the result SIMD buffer.
  • type (DType): DType of value, the result SIMD buffer.

Args:

  • value (SIMD[type, size]): The vector that will contain the result of the scatter operation.
  • base (SIMD[address, size]): The vector containing memory addresses that scatter will access.
  • mask (SIMD[bool, size]): A binary vector which prevents memory access to certain lanes of the base vector.
  • alignment (Int): The alignment of the source addresses. Must be 0 or a power of two constant integer value.

strided_load

strided_load[size: Int, type: DType](addr: DTypePointer[type], stride: Int, mask: SIMD[bool, size]) -> SIMD[type, size]

Load values from addr according to a specific stride.

Parameters:

  • size (Int): Size of the result vector.
  • type (DType): DType of the result vector.

Args:

  • addr (DTypePointer[type]): The memory location to load data from.
  • stride (Int): How many lanes to skip before loading again.
  • mask (SIMD[bool, size]): A binary vector which prevents memory access to certain lanes of value.

Returns:

A vector containing the loaded data.

strided_load[size: Int, type: DType](addr: DTypePointer[type], stride: Int) -> SIMD[type, size]

Load values from addr according to a specific stride.

Parameters:

  • size (Int): Size of the result vector.
  • type (DType): DType of the result vector.

Args:

  • addr (DTypePointer[type]): The memory location to load data from.
  • stride (Int): How many lanes to skip before loading again.

Returns:

A vector containing the loaded data.

strided_store

strided_store[size: Int, type: DType](value: SIMD[type, size], addr: DTypePointer[type], stride: Int, mask: SIMD[bool, size])

Load values from addr according to a specific stride.

Parameters:

  • size (Int): Size of value, the value to store.
  • type (DType): DType of value, the value to store.

Args:

  • value (SIMD[type, size]): The values to store.
  • addr (DTypePointer[type]): The location to store values at.
  • stride (Int): How many lanes to skip before storing again.
  • mask (SIMD[bool, size]): A binary vector which prevents memory access to certain lanes of value.

strided_store[size: Int, type: DType](value: SIMD[type, size], addr: DTypePointer[type], stride: Int)

Load values from addr according to a specific stride.

Parameters:

  • size (Int): Size of value, the value to store.
  • type (DType): DType of value, the value to store.

Args:

  • value (SIMD[type, size]): The values to store.
  • addr (DTypePointer[type]): The location to store values at.
  • stride (Int): How many lanes to skip before storing again.