Mojo function

batched_matmul_dispatch_sm100_bf16

batched_matmul_dispatch_sm100_bf16[c_type: DType, a_type: DType, b_type: DType, transpose_b: Bool](c: TileTensor[c_type, c.LayoutType, c.origin, address_space=c.address_space, linear_idx_type=c.linear_idx_type, element_size=c.element_size], a: TileTensor[a_type, a.LayoutType, a.origin, address_space=a.address_space, linear_idx_type=a.linear_idx_type, element_size=a.element_size], b: TileTensor[b_type, b.LayoutType, b.origin, address_space=b.address_space, linear_idx_type=b.linear_idx_type, element_size=b.element_size], ctx: DeviceContext)

Dispatch batched BF16 matmul to SM100 kernel with a default config.

Uses a reasonable default config (256x256x16 MMA, 2x1x1 cluster, cta_group=2) which works well for a variety of batched matmul shapes.