Functional
Module
Implements higher-order functions.
Aliases:
BinaryTile1DTileUnitFunc = fn[Int](Int, Int) capturing -> None
Signature of a tiled function that performs some work with a dynamic tile size and a secondary static tile size.
Dynamic1DTileUnitFunc = fn(Int, Int) capturing -> None
Signature of a 1d tiled function that performs some work with a dynamic tile size and an offset. i.e. func(offset: Int, tile_size: Int)
Dynamic1DTileUnswitchUnitFunc = fn[Bool](Int, Int, Int) capturing -> None
Static1DTileUnitFunc = fn[Int](Int) capturing -> None
Signature of a 1d tiled function that performs some work with a static tile size and an offset. i.e. func<tile_size: Int> (offset: Int)
Static1DTileUnswitchUnitFunc = fn[Int, Bool](Int, Int) capturing -> None
Signature of a tiled function that performs some work with a static tile size and an offset. i.e. func<tile_size: Int> (offset: Int)
Static2DTileUnitFunc = fn[Int, Int](Int, Int) capturing -> None
Signature of a 2d tiled function that performs some work with a static tile size and an offset. i.e. func<tile_size_x: Int, tile_size_y: Int> (offset_x: Int, offset_y: Int)
SwitchedFunction = fn[Bool]() capturing -> None
SwitchedFunction2 = fn[Bool, Bool]() capturing -> None
async_parallelize
async_parallelize[func: fn(Int) capturing -> None](out_chain: OutputChainPtr, num_work_items: Int)
Execute func(0) … func(num_work_items-1) as sub-tasks in parallel and return imediatly.
Execute func(0) … func(num_work_items-1) as sub-tasks in parallel and mark out_chain as ready when all functions have returned. This function will return when the sub-tasks have been scheduled but not necessarily completed. The runtime may execute the sub-tasks in any order and with any degree of concurrency.
All free variables in func must be “async safe”. Currently this means: - The variable must be bound by a by-val function argument (ie no &), or let binding. - The variable’s type must be “async safe”, ie is marked as @register_passable and any internal pointers are to memory with lifetime at least until out_chain is ready. In practice, this means only pointers to buffers held alive by the runtime.
If num_work_items is 0 then the out_chain is marked as ready before async_parallelize returns. If num_work_items is 1 then func(0) is still executed as a sub-task.
Parameters:
- func (
fn(Int) capturing -> None
): The function to invoke.
Args:
- out_chain (
OutputChainPtr
): Out chain to attach results too. - num_work_items (
Int
): Number of parallel tasks.
elementwise
elementwise[rank: Int, simd_width: Int, unroll_factor: Int, func: fn[Int, Int](StaticIntTuple[*(0,1)]) capturing -> None](shape: StaticIntTuple[rank], out_chain: OutputChainPtr)
Execute funcwidth, rank as sub-tasks for a suitable combination of width and indices so as to cover shape.
Parameters:
- rank (
Int
): The rank of the buffer. - simd_width (
Int
): The SIMD vector width to use. - unroll_factor (
Int
): The unroll factor to use. - func (
fn[Int, Int](StaticIntTuple[*(0,1)]) capturing -> None
): The body function.
Args:
- shape (
StaticIntTuple[rank]
): The shape of the buffer. - out_chain (
OutputChainPtr
): The our chain to attach results to.
elementwise[rank: Int, simd_width: Int, unroll_factor: Int, func: fn[Int, Int](StaticIntTuple[*(0,1)]) capturing -> None](shape: StaticIntTuple[rank], out_chain: OutputChainPtr)
Execute funcwidth, rank as sub-tasks for a suitable combination of width and indices so as to cover shape.
All free vars in func must be “async safe”, see async_parallelize.
Parameters:
- rank (
Int
): The rank of the buffer. - simd_width (
Int
): The SIMD vector width to use. - unroll_factor (
Int
): The unroll factor to use. - func (
fn[Int, Int](StaticIntTuple[*(0,1)]) capturing -> None
): The body function.
Args:
- shape (
StaticIntTuple[rank]
): The shape of the buffer. - out_chain (
OutputChainPtr
): The our chain to attach results to.
get_num_workers
get_num_workers(problem_size: Int, runtime: Runtime) -> Int
Return a number of workers to run in parallel.
Args:
- problem_size (
Int
): The number of parallel tasks. - runtime (
Runtime
): Runtime object.
Returns:
The number of workers to run in parallel.
map
map[func: fn(Int) capturing -> None](size: Int)
Map a function over a range from 0 to size.
Parameters:
- func (
fn(Int) capturing -> None
): Function to map.
Args:
- size (
Int
): The number of elements.
parallelize
parallelize[func: fn(Int) capturing -> None]()
Execute func(0) … func(N-1) as sub-tasks in parallel and block until completion. N is chosen to be the number of physical processors on the system.
Execute func(0) … func(N-1) as sub-tasks in parallel. This function will return after all the sub-tasks have completely been executed (and unlike async_parallelize this is a blocking operation).
Parameters:
- func (
fn(Int) capturing -> None
): The function to invoke.
parallelize[func: fn(Int) capturing -> None](num_work_items: Int)
Execute func(0) … func(num_work_items-1) as sub-tasks in parallel and block until completion.
Execute func(0) … func(num_work_items-1) as sub-tasks in parallel. This function will return after all the sub-tasks have completely been executed (and unlike async_parallelize this is a blocking operation).
Parameters:
- func (
fn(Int) capturing -> None
): The function to invoke.
Args:
- num_work_items (
Int
): Number of parallel tasks.
tile
tile[workgroup_function: fn[Int](Int) capturing -> None, tile_size_list: VariadicList[Int]](offset: Int, upperbound: Int)
A generator that launches work groups in specified list of tile sizes.
A workgroup function is a function that can process a configurable consecutive “tile” of workload. E.g. work_on3 should launch computation on item 5,6,7, and should be semantically equivalent to work_on1, work_on1, work_on1.
This generator will try to proceed with the given list of tile sizes on the listed order. E.g. tile func, (3,2,1) will try to call func[3] starting from offset until remaining work is less than 3 from upperbound and then try func[2], and then func[1] etc.
Parameters:
- workgroup_function (
fn[Int](Int) capturing -> None
): Workgroup function that processes one tile of workload. - tile_size_list (
VariadicList[Int]
): List of tile sizes to launch work.
Args:
- offset (
Int
): The initial index to start the work from. - upperbound (
Int
): The runtime upperbound that the work function should not exceed.
tile[workgroup_function: fn(Int, Int) capturing -> None](offset: Int, upperbound: Int, tile_size_list: VariadicList[Int])
A generator that launches work groups in specified list of tile sizes.
This is the version of tile generator for the case where work_group function can take the tile size as a runtime value.
Parameters:
- workgroup_function (
fn(Int, Int) capturing -> None
): Workgroup function that processes one tile of workload.
Args:
- offset (
Int
): The initial index to start the work from. - upperbound (
Int
): The runtime upperbound that the work function should not exceed. - tile_size_list (
VariadicList[Int]
): List of tile sizes to launch work.
tile[secondary_tile_size_list: VariadicList[Int], secondary_cleanup_tile: Int, workgroup_function: fn[Int](Int, Int) capturing -> None](offset: Int, upperbound: Int, primary_tile_size_list: VariadicList[Int], primary_cleanup_tile: Int)
A generator that launches work groups in specified list of tile sizes until the sum of primary_tile_sizes has exceeded the upperbound.
Parameters:
- secondary_tile_size_list (
VariadicList[Int]
): List of static tile sizes to launch work. - secondary_cleanup_tile (
Int
): Last static tile to use when primary tile sizes don’t fit exactly within the upperbound. - workgroup_function (
fn[Int](Int, Int) capturing -> None
): Workgroup function that processes one tile of workload.
Args:
- offset (
Int
): The initial index to start the work from. - upperbound (
Int
): The runtime upperbound that the work function should not exceed. - primary_tile_size_list (
VariadicList[Int]
): List of dynamic tile sizes to launch work. - primary_cleanup_tile (
Int
): Last dynamic tile to use when primary tile sizes don’t fit exactly within the upperbound.
tile[workgroup_function: fn[Int, Int](Int, Int) capturing -> None, tile_sizes_x: VariadicList[Int], tile_sizes_y: VariadicList[Int]](offset_x: Int, offset_y: Int, upperbound_x: Int, upperbound_y: Int)
Launches workgroup_function using the largest tile sizes possible in each dimension, starting from the x and y offset, until the x and y upperbounds are reached.
Parameters:
- workgroup_function (
fn[Int, Int](Int, Int) capturing -> None
): Funtion that is invoked for each tile and offset. - tile_sizes_x (
VariadicList[Int]
): List of tile sizes to use for the first parameter of workgroup_function. - tile_sizes_y (
VariadicList[Int]
): List of tile sizes to use for the second parameter of workgroup_function.
Args:
- offset_x (
Int
): Initial x offset passed to workgroup_function. - offset_y (
Int
): Initial y offset passed to workgroup_function. - upperbound_x (
Int
): Max offset in x dimension passed to workgroup function. - upperbound_y (
Int
): Max offset in y dimension passed to workgroup function.
tile_and_unswitch
tile_and_unswitch[workgroup_function: fn[Int, Bool](Int, Int) capturing -> None, tile_size_list: VariadicList[Int]](offset: Int, upperbound: Int)
Perform time and unswitch functional transformation.
A variant of static tile given a workgroup function that can be unswitched. This generator is a fused version of tile and unswitch, where the static unswitch is true throughout the “inner” portion of the workload and is false only on the residue tile.
Parameters:
- workgroup_function (
fn[Int, Bool](Int, Int) capturing -> None
): Workgroup function that processes one tile of workload. - tile_size_list (
VariadicList[Int]
): List of tile sizes to launch work.
Args:
- offset (
Int
): The initial index to start the work from. - upperbound (
Int
): The runtime upperbound that the work function should not exceed.
tile_and_unswitch[workgroup_function: fn[Bool](Int, Int, Int) capturing -> None](offset: Int, upperbound: Int, tile_size_list: VariadicList[Int])
Perform time and unswitch functional transformation.
A variant of dynamic tile given a workgroup function that can be unswitched. This generator is a fused version of tile and unswitch, where the static unswitch is true throughout the “inner” portion of the workload and is false only on the residue tile.
Parameters:
- workgroup_function (
fn[Bool](Int, Int, Int) capturing -> None
): Workgroup function that processes one tile of workload.
Args:
- offset (
Int
): The initial index to start the work from. - upperbound (
Int
): The runtime upperbound that the work function should not exceed. - tile_size_list (
VariadicList[Int]
): List of tile sizes to launch work.
unroll
unroll[count: Int, func: fn[Int]() capturing -> None]()
Repeatedly evaluate a function count
times.
Parameters:
- count (
Int
): A number of repetitions. - func (
fn[Int]() capturing -> None
): The function to evaluate. The function should take a single Int argument.
unroll2
unroll2[dim0: Int, dim1: Int, func: fn[Int, Int]() capturing -> None]()
Repeateadly evaluate a 2D nested loop.
Parameters:
- dim0 (
Int
): The first dimension size. - dim1 (
Int
): The second dimension size. - func (
fn[Int, Int]() capturing -> None
): The function to evaluate. The function should take two Int arguments.
unroll3
unroll3[dim0: Int, dim1: Int, dim2: Int, func: fn[Int, Int, Int]() capturing -> None]()
Repeateadly evaluate a 3D nested loop.
Parameters:
- dim0 (
Int
): The first dimension size. - dim1 (
Int
): The second dimension size. - dim2 (
Int
): The second dimension size. - func (
fn[Int, Int, Int]() capturing -> None
): The function to evaluate. The function should take three Int arguments.
unswitch
unswitch[switched_func: fn[Bool]() capturing -> None](dynamic_switch: Bool)
Perform a functional unswitch transformation.
Unswitch is a simple pattern that is similar idea to loop unswitching pass but extended to functional patterns. The pattern facilitates the following code transformation that reduces the number of branches in the generated code
Before:
```
for i in range(...)
if i < xxx:
...
```
After:
```
if i < ...
for i in range(...)
...
else
for i in range(...)
if i < xxx:
...
```
This unswitch function genralizes that pattern with the help of meta parame- ters and can be used to perform both loop unswitching and other tile predic- ate lifting like in simd and amx.
TODO: Generalize to support multiple predicates. TODO: Once nested lambdas compose well should make unswitch compose with tile in an easy way.
Parameters:
- switched_func (
fn[Bool]() capturing -> None
): The function containing the inner loop logic that can be unswitched.
Args:
- dynamic_switch (
Bool
): The dynamic condition that enables the unswitched code path.
unswitch[switched_func: fn[Bool, Bool]() capturing -> None](dynamic_switch_a: Bool, dynamic_switch_b: Bool)
Perform a functional 2-predicates unswitch transformation.
Parameters:
- switched_func (
fn[Bool, Bool]() capturing -> None
): The function containing the inner loop logic that has 2 predicates which can be unswitched.
Args:
- dynamic_switch_a (
Bool
): The first dynamic condition that enables the outer unswitched code path. - dynamic_switch_b (
Bool
): The second dynamic condition that enables the inner unswitched code path.
vectorize
vectorize[simd_width: Int, func: fn[Int](Int) capturing -> None](size: Int)
Map a function which is parametrized over a simd_width over a range from 0 to size in simd fashion.
Parameters:
- simd_width (
Int
): The SIMD vector width. - func (
fn[Int](Int) capturing -> None
): The function for the loop body.
Args:
- size (
Int
): The total loop count.
vectorize_unroll
vectorize_unroll[simd_width: Int, unroll_factor: Int, func: fn[Int](Int) capturing -> None](size: Int)
Map a function which is parametrized over a simd_width over a range from 0 to size in simd fashion and unroll the loop by unroll_factor.
Parameters:
- simd_width (
Int
): The SIMD vector width. - unroll_factor (
Int
): The unroll factor for the main loop. - func (
fn[Int](Int) capturing -> None
): The function for the loop body.
Args:
- size (
Int
): The total loop count.