Mojo function

pack_b

pack_b[transpose_b: Bool, simd_size: Int, inner_size: Int, a_type: DType, b_type: DType, c_type: DType](dst: TileTensor[b_type, dst.LayoutType, dst.origin, linear_idx_type=dst.linear_idx_type, element_size=dst.element_size], src: TileTensor[b_type, src.LayoutType, src.origin, linear_idx_type=src.linear_idx_type, element_size=src.element_size], tile_n: Int, tile_k: Int)

Utility function to pack the entire B matrix, such that each [tile_n // inner_size, tile_k, inner_size] tile of src is contiguous in dst.

Tiles (not tile contents) are stored in row major order, so tile[i, j] is tile_n * tile_k bytes away from tile[i, j+1].