Mojo function

tiled_matmul_run

tiled_matmul_run[config: KernelConfig, transpose_b: Bool, b_packed: Bool, simd_size: Int, elementwise_epilogue_enabled: Bool, kernel_id: InnerKernelID, algorithm: InnerMatmulKernel](alg: algorithm, c: NDBuffer[dtype, 2, origin, shape], a: NDBuffer[dtype, 2, origin, shape], b: NDBuffer[dtype, 2, origin, shape], elementwise_epilogue_fn: fn(GemmShape, GemmShape) escaping -> None, global_tile_shape: GemmShape, global_tile_offset: GemmShape)

Interface function to run tiled matmul on a given sub-tile.

Args:

alg (algorithm): InnerMatmulKernel algorithm for microkernel.
c (NDBuffer[dtype, 2, origin, shape]): Pre-allocated buffer space for result.
a (NDBuffer[dtype, 2, origin, shape]): Operand A of the matmul.
b (NDBuffer[dtype, 2, origin, shape]): Operand B of the mamtul.
elementwise_epilogue_fn (fn(GemmShape, GemmShape) escaping -> None): The elementwise epilogue function.
global_tile_shape (GemmShape): Tile shape this call will process.
global_tile_offset (GemmShape): Tile offset on the original buffer.