Mojo function

dispatch_sm100_batched_matmul

dispatch_sm100_batched_matmul[c_type: DType, a_type: DType, b_type: DType, transpose_b: Bool, pdl_level: PDLLevel = PDLLevel()](c: TileTensor[c_type, c.LayoutType, c.origin, address_space=c.address_space, linear_idx_type=c.linear_idx_type, element_size=c.element_size], a: TileTensor[a_type, a.LayoutType, a.origin, address_space=a.address_space, linear_idx_type=a.linear_idx_type, element_size=a.element_size], b: TileTensor[b_type, b.LayoutType, b.origin, address_space=b.address_space, linear_idx_type=b.linear_idx_type, element_size=b.element_size], ctx: DeviceContext)

Dispatch batched matmul to SM100 kernel.

First, try to dispatch to a batched matmul config from the tuning table. Then try to find a optimized config for the given shape. If not found, then dispatch to a default config.