Mojo function

batched_matmul

batched_matmul[*, transpose_a: Bool = False, transpose_b: Bool = False, elementwise_epilogue_fn: Optional[def[c_type: DType, width: Int, rank: Int, *, alignment: Int = 1](IndexList[rank], SIMD[c_type, width]) capturing -> None] = None, saturated_vnni: Bool = False, target: StringSlice[StaticConstantOrigin] = StringSlice("cpu")](c_buf: TileTensor[linear_idx_type=c_buf.linear_idx_type, element_size=c_buf.element_size], a_buf: TileTensor[linear_idx_type=a_buf.linear_idx_type, element_size=a_buf.element_size], b_buf: TileTensor[linear_idx_type=b_buf.linear_idx_type, element_size=b_buf.element_size], *, context: DeviceContextPtr = DeviceContextPtr())

TileTensor primary implementation of batched_matmul.