Module

Implements higher-order functions.

You can import these APIs from the `algorithm` package. For example:

``from algorithm import map``

Aliases:

• `Static1DTileUnitFunc = fn[Int](Int) capturing -> None`: Signature of a 1d tiled function that performs some work with a static tile size and an offset. i.e. func<tile_size: Int> (offset: Int)
• `Dynamic1DTileUnitFunc = fn(Int, Int) capturing -> None`: Signature of a 1d tiled function that performs some work with a dynamic tile size and an offset. i.e. func(offset: Int, tile_size: Int)
• `BinaryTile1DTileUnitFunc = fn[Int](Int, Int) capturing -> None`: Signature of a tiled function that performs some work with a dynamic tile size and a secondary static tile size.
• `Static2DTileUnitFunc = fn[Int, Int](Int, Int) capturing -> None`: Signature of a 2d tiled function that performs some work with a static tile size and an offset. i.e. func<tile_size_x: Int, tile_size_y: Int> (offset_x: Int, offset_y: Int)
• `SwitchedFunction = fn[Bool]() capturing -> None`
• `SwitchedFunction2 = fn[Bool, Bool]() capturing -> None`
• `Static1DTileUnswitchUnitFunc = fn[Int, Bool](Int, Int) capturing -> None`: Signature of a tiled function that performs some work with a static tile size and an offset. i.e. func<tile_size: Int> (offset: Int)
• `Dynamic1DTileUnswitchUnitFunc = fn[Bool](Int, Int, Int) capturing -> None`

## `map`

`map[func: fn(Int) capturing -> None](size: Int)`

Maps a function over a range from 0 to size.

Parameters:

• func (`fn(Int) capturing -> None`): Function to map.

Args:

• size (`Int`): The number of elements.

## `unroll`

`unroll[count: Int, func: fn[Int]() capturing -> None]()`

Repeatedly evaluates a function `count` times.

Parameters:

• count (`Int`): A number of repetitions.
• func (`fn[Int]() capturing -> None`): The function to evaluate. The function should take a single Int argument.

`unroll[dim0: Int, dim1: Int, func: fn[Int, Int]() capturing -> None]()`

Repeatedly evaluates a 2D nested loop.

Parameters:

• dim0 (`Int`): The first dimension size.
• dim1 (`Int`): The second dimension size.
• func (`fn[Int, Int]() capturing -> None`): The function to evaluate. The function should take two Int arguments.

`unroll[dim0: Int, dim1: Int, dim2: Int, func: fn[Int, Int, Int]() capturing -> None]()`

Repeatedly evaluates a 3D nested loop.

Parameters:

• dim0 (`Int`): The first dimension size.
• dim1 (`Int`): The second dimension size.
• dim2 (`Int`): The second dimension size.
• func (`fn[Int, Int, Int]() capturing -> None`): The function to evaluate. The function should take three Int arguments.

## `vectorize`

`vectorize[simd_width: Int, func: fn[Int](Int) capturing -> None](size: Int)`

Maps a function which is parametrized over a simd_width over a range from 0 to size in simd fashion.

Parameters:

• simd_width (`Int`): The SIMD vector width.
• func (`fn[Int](Int) capturing -> None`): The function for the loop body.

Args:

• size (`Int`): The total loop count.

## `vectorize_unroll`

`vectorize_unroll[simd_width: Int, unroll_factor: Int, func: fn[Int](Int) capturing -> None](size: Int)`

Maps a function which is parametrized over a simd_width over a range from 0 to size in simd fashion and unroll the loop by unroll_factor.

Parameters:

• simd_width (`Int`): The SIMD vector width.
• unroll_factor (`Int`): The unroll factor for the main loop.
• func (`fn[Int](Int) capturing -> None`): The function for the loop body.

Args:

• size (`Int`): The total loop count.

## `async_parallelize`

`async_parallelize[func: fn(Int) capturing -> None](out_chain: OutputChainPtr, num_work_items: Int)`

Executes func(0) â€¦ func(num_work_items-1) as sub-tasks in parallel and returns immediately. The out_chain will be marked as ready only when all sub-tasks have completed.

Execute func(0) â€¦ func(num_work_items-1) as sub-tasks in parallel and mark out_chain as ready when all functions have returned. This function will return when the sub-tasks have been scheduled but not necessarily completed. The runtime may execute the sub-tasks in any order and with any degree of concurrency.

All free variables in func must be â€śasync safeâ€ť. Currently this means: - The variable must be bound by a by-val function argument (ie no &), or let binding. - The variableâ€™s type must be â€śasync safeâ€ť, ie is marked as @register_passable and any internal pointers are to memory with lifetime at least until out_chain is ready. In practice, this means only pointers to buffers held alive by the runtime. Consider using sync_parallelize if this requirement is too onerous.

If num_work_items is 0 then the out_chain is marked as ready before async_parallelize returns. If num_work_items is 1 then func(0) may still be executed as a sub-task.

Parameters:

• func (`fn(Int) capturing -> None`): The function to invoke.

Args:

• out_chain (`OutputChainPtr`): Out chain onto which to signal completion.
• num_work_items (`Int`): Number of parallel tasks.

## `sync_parallelize`

`sync_parallelize[func: fn(Int) capturing -> None](out_chain: OutputChainPtr, num_work_items: Int)`

Executes func(0) â€¦ func(num_work_items-1) as sub-tasks in parallel. Marks out_chain as ready and returns only when all sub-tasks have completed.

Execute func(0) â€¦ func(num_work_items-1) as sub-tasks in parallel, and return only when they have all functions have returned. The runtime may execute the sub-tasks in any order and with any degree of concurrency. The out_chain will be marked as ready before returning.

Parameters:

• func (`fn(Int) capturing -> None`): The function to invoke.

Args:

• out_chain (`OutputChainPtr`): Out chain onto which to signal completion.
• num_work_items (`Int`): Number of parallel tasks.

## `parallelize`

`parallelize[func: fn(Int) capturing -> None]()`

Executes func(0) â€¦ func(N-1) as sub-tasks in parallel and returns when all are complete. N is chosen to be the number of processors on the system.

Execute func(0) â€¦ func(N-1) as sub-tasks in parallel. This function will return only after all the sub-tasks have completed.

CAUTION: Creates and destroys a local runtime! Do not use from kernels!

Parameters:

• func (`fn(Int) capturing -> None`): The function to invoke.

`parallelize[func: fn(Int) capturing -> None](num_work_items: Int)`

Executes func(0) â€¦ func(num_work_items-1) as sub-tasks in parallel and returns when all are complete.

Execute func(0) â€¦ func(num_work_items-1) as sub-tasks in parallel. This function will return only after all the sub-tasks have completed.

CAUTION: Creates and destroys a local runtime! Do not use from kernels!

Parameters:

• func (`fn(Int) capturing -> None`): The function to invoke.

Args:

• num_work_items (`Int`): Number of parallel tasks.

`parallelize[func: fn(Int) capturing -> None](rt: Runtime, num_work_items: Int)`

Executes func(0) â€¦ func(num_work_items-1) as sub-tasks in parallel and returns when all are complete.

Execute func(0) â€¦ func(num_work_items-1) as sub-tasks in parallel. This function will return only after all the sub-tasks have completed.

Parameters:

• func (`fn(Int) capturing -> None`): The function to invoke.

Args:

• rt (`Runtime`): The runtime.
• num_work_items (`Int`): Number of parallel tasks.

`parallelize[func: fn(Int) capturing -> None](rt: Runtime, num_work_items: Int, num_workers: Int)`

Executes func(0) â€¦ func(num_work_items-1) as sub-tasks in parallel and returns when all are complete.

Execute func(0) â€¦ func(num_work_items-1) as sub-tasks in parallel. This function will return only after all the sub-tasks have completed.

Parameters:

• func (`fn(Int) capturing -> None`): The function to invoke.

Args:

• rt (`Runtime`): The runtime.
• num_work_items (`Int`): Number of parallel tasks.
• num_workers (`Int`): The number of works to use for execution.

## `tile`

`tile[workgroup_function: fn[Int](Int) capturing -> None, tile_size_list: VariadicList[Int]](offset: Int, upperbound: Int)`

A generator that launches work groups in specified list of tile sizes.

A workgroup function is a function that can process a configurable consecutive â€śtileâ€ť of workload. E.g. `work_on[3](5)` should launch computation on item 5,6,7, and should be semantically equivalent to `work_on[1](5)`, `work_on[1](6)`, `work_on[1](7)`.

This generator will try to proceed with the given list of tile sizes on the listed order. E.g. `tile[func, (3,2,1)](offset, upperbound)` will try to call `func[3]` starting from offset until remaining work is less than 3 from upperbound and then try `func[2]`, and then `func[1]`, etc.

Parameters:

• workgroup_function (`fn[Int](Int) capturing -> None`): Workgroup function that processes one tile of workload.
• tile_size_list (`VariadicList[Int]`): List of tile sizes to launch work.

Args:

• offset (`Int`): The initial index to start the work from.
• upperbound (`Int`): The runtime upperbound that the work function should not exceed.

`tile[workgroup_function: fn(Int, Int) capturing -> None](offset: Int, upperbound: Int, tile_size_list: VariadicList[Int])`

A generator that launches work groups in specified list of tile sizes.

This is the version of tile generator for the case where work_group function can take the tile size as a runtime value.

Parameters:

• workgroup_function (`fn(Int, Int) capturing -> None`): Workgroup function that processes one tile of workload.

Args:

• offset (`Int`): The initial index to start the work from.
• upperbound (`Int`): The runtime upperbound that the work function should not exceed.
• tile_size_list (`VariadicList[Int]`): List of tile sizes to launch work.

`tile[secondary_tile_size_list: VariadicList[Int], secondary_cleanup_tile: Int, workgroup_function: fn[Int](Int, Int) capturing -> None](offset: Int, upperbound: Int, primary_tile_size_list: VariadicList[Int], primary_cleanup_tile: Int)`

A generator that launches work groups in specified list of tile sizes until the sum of primary_tile_sizes has exceeded the upperbound.

Parameters:

• secondary_tile_size_list (`VariadicList[Int]`): List of static tile sizes to launch work.
• secondary_cleanup_tile (`Int`): Last static tile to use when primary tile sizes donâ€™t fit exactly within the upperbound.
• workgroup_function (`fn[Int](Int, Int) capturing -> None`): Workgroup function that processes one tile of workload.

Args:

• offset (`Int`): The initial index to start the work from.
• upperbound (`Int`): The runtime upperbound that the work function should not exceed.
• primary_tile_size_list (`VariadicList[Int]`): List of dynamic tile sizes to launch work.
• primary_cleanup_tile (`Int`): Last dynamic tile to use when primary tile sizes donâ€™t fit exactly within the upperbound.

`tile[workgroup_function: fn[Int, Int](Int, Int) capturing -> None, tile_sizes_x: VariadicList[Int], tile_sizes_y: VariadicList[Int]](offset_x: Int, offset_y: Int, upperbound_x: Int, upperbound_y: Int)`

Launches workgroup_function using the largest tile sizes possible in each dimension, starting from the x and y offset, until the x and y upperbounds are reached.

Parameters:

• workgroup_function (`fn[Int, Int](Int, Int) capturing -> None`): Function that is invoked for each tile and offset.
• tile_sizes_x (`VariadicList[Int]`): List of tile sizes to use for the first parameter of workgroup_function.
• tile_sizes_y (`VariadicList[Int]`): List of tile sizes to use for the second parameter of workgroup_function.

Args:

• offset_x (`Int`): Initial x offset passed to workgroup_function.
• offset_y (`Int`): Initial y offset passed to workgroup_function.
• upperbound_x (`Int`): Max offset in x dimension passed to workgroup function.
• upperbound_y (`Int`): Max offset in y dimension passed to workgroup function.

## `unswitch`

`unswitch[switched_func: fn[Bool]() capturing -> None](dynamic_switch: Bool)`

Performs a functional unswitch transformation.

Unswitch is a simple pattern that is similar idea to loop unswitching pass but extended to functional patterns. The pattern facilitates the following code transformation that reduces the number of branches in the generated code

Before:

``````for i in range(...)
if i < xxx:
...``````

After:

``````if i < ...
for i in range(...)
...
else
for i in range(...)
if i < xxx:
...``````

This unswitch function generalizes that pattern with the help of meta parameters and can be used to perform both loop unswitching and other tile predicate lifting like in simd and amx.

TODO: Generalize to support multiple predicates. TODO: Once nested lambdas compose well should make unswitch compose with tile in an easy way.

Parameters:

• switched_func (`fn[Bool]() capturing -> None`): The function containing the inner loop logic that can be unswitched.

Args:

• dynamic_switch (`Bool`): The dynamic condition that enables the unswitched code path.

`unswitch[switched_func: fn[Bool, Bool]() capturing -> None](dynamic_switch_a: Bool, dynamic_switch_b: Bool)`

Performs a functional 2-predicates unswitch transformation.

Parameters:

• switched_func (`fn[Bool, Bool]() capturing -> None`): The function containing the inner loop logic that has 2 predicates which can be unswitched.

Args:

• dynamic_switch_a (`Bool`): The first dynamic condition that enables the outer unswitched code path.
• dynamic_switch_b (`Bool`): The second dynamic condition that enables the inner unswitched code path.

## `tile_and_unswitch`

`tile_and_unswitch[workgroup_function: fn[Int, Bool](Int, Int) capturing -> None, tile_size_list: VariadicList[Int]](offset: Int, upperbound: Int)`

Performs time and unswitch functional transformation.

A variant of static tile given a workgroup function that can be unswitched. This generator is a fused version of tile and unswitch, where the static unswitch is true throughout the â€śinnerâ€ť portion of the workload and is false only on the residue tile.

Parameters:

• workgroup_function (`fn[Int, Bool](Int, Int) capturing -> None`): Workgroup function that processes one tile of workload.
• tile_size_list (`VariadicList[Int]`): List of tile sizes to launch work.

Args:

• offset (`Int`): The initial index to start the work from.
• upperbound (`Int`): The runtime upperbound that the work function should not exceed.

`tile_and_unswitch[workgroup_function: fn[Bool](Int, Int, Int) capturing -> None](offset: Int, upperbound: Int, tile_size_list: VariadicList[Int])`

Performs time and unswitch functional transformation.

A variant of dynamic tile given a workgroup function that can be unswitched. This generator is a fused version of tile and unswitch, where the static unswitch is true throughout the â€śinnerâ€ť portion of the workload and is false only on the residue tile.

Parameters:

• workgroup_function (`fn[Bool](Int, Int, Int) capturing -> None`): Workgroup function that processes one tile of workload.

Args:

• offset (`Int`): The initial index to start the work from.
• upperbound (`Int`): The runtime upperbound that the work function should not exceed.
• tile_size_list (`VariadicList[Int]`): List of tile sizes to launch work.

## `elementwise`

`elementwise[rank: Int, simd_width: Int, func: fn[Int, Int](StaticIntTuple[*(0,1)]) capturing -> None](shape: StaticIntTuple[rank], out_chain: OutputChainPtr)`

Executes `func[width, rank](indices)` as sub-tasks for a suitable combination of width and indices so as to cover shape.

Parameters:

• rank (`Int`): The rank of the buffer.
• simd_width (`Int`): The SIMD vector width to use.
• func (`fn[Int, Int](StaticIntTuple[*(0,1)]) capturing -> None`): The body function.

Args:

• shape (`StaticIntTuple[rank]`): The shape of the buffer.
• out_chain (`OutputChainPtr`): The our chain to attach results to.

## `parallelize_over_rows`

`parallelize_over_rows[rank: Int, func: fn(Int, Int) capturing -> None](shape: StaticIntTuple[rank], axis: Int, out_chain: OutputChainPtr, grain_size: Int)`

Parallelize func over non-axis dims of shape.

Parameters:

• rank (`Int`): Rank of shape.
• func (`fn(Int, Int) capturing -> None`): Function to call on range of rows.

Args:

• shape (`StaticIntTuple[rank]`): Shape to parallelize over.
• axis (`Int`): Rows are slices along the axis dimension of shape.
• out_chain (`OutputChainPtr`): The out chain to attach results to.
• grain_size (`Int`): The minimum number of elements to warrant using an additional thread.