Mojo struct

MatmulKernels

@register_passable(trivial) struct MatmulKernels[a_type: DType, b_type: DType, c_type: DType, transpose_b: Bool = False]

Supported matmul kernels.

The configurations are named as: . BK, mma shape, and warp tile shape are decided internally.

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable

`comptime` members

`copyinitis_trivial`

comptime __copyinit__is_trivial = True

`delis_trivial`

comptime __del__is_trivial = True

`moveinitis_trivial`

comptime __moveinit__is_trivial = True

`ampere_128x128_4`

comptime ampere_128x128_4 = MatmulConfig[a_type, b_type, c_type, transpose_b](Index(128, 128, _bk_base[a_type]()), Index(64, 64, _bk_base[a_type]()), get_mma_shape[a_type, MatmulConfig[a_type, b_type, c_type, transpose_b].accum_type](), Index(1, 1, 1), 4, 1, 1, 1, 1, False, PDLLevel())

`ampere_256x128_3`

comptime ampere_256x128_3 = MatmulConfig[a_type, b_type, c_type, transpose_b](Index(128, 256, (2 * _bk_base[a_type]())), Index(64, 64, (2 * _bk_base[a_type]())), get_mma_shape[a_type, MatmulConfig[a_type, b_type, c_type, transpose_b].accum_type](), Index(1, 1, 1), 3, 1, 1, 1, 1, False, PDLLevel())

`ampere_256x64_4`

comptime ampere_256x64_4 = MatmulConfig[a_type, b_type, c_type, transpose_b](Index(64, 256, _bk_base[a_type]()), Index(64, 64, _bk_base[a_type]()), get_mma_shape[a_type, MatmulConfig[a_type, b_type, c_type, transpose_b].accum_type](), Index(1, 1, 1), 4, 1, 1, 1, 1, False, PDLLevel())

`hopper_128x128_4`

comptime hopper_128x128_4 = MatmulConfig[a_type, b_type, c_type, transpose_b](Index(128, 128, _bk_base[a_type]()), Index(64, 64, _bk_base[a_type]()), get_mma_shape[a_type, MatmulConfig[a_type, b_type, c_type, transpose_b].accum_type](), Index(1, 1, 1), 4, 1, 1, 1, 1, False, PDLLevel())

`tuning_config`

comptime tuning_config = MatmulConfig[a_type, b_type, c_type, transpose_b](Index(env_get_int["TUNE_BM", 128](), env_get_int["TUNE_BN", 128](), env_get_int["TUNE_BK", 32]()), Index(env_get_int["TUNE_WM", 64](), env_get_int["TUNE_WN", 64](), env_get_int["TUNE_BK", 32]()), get_mma_shape[a_type, MatmulConfig[a_type, b_type, c_type, transpose_b].accum_type](), Index(1, 1, 1), UInt(env_get_int["TUNE_NUM_STAGES", 4]()), UInt(env_get_int["TUNE_NUM_K_PARTITIONS", 1]()), 1, UInt(env_get_int["TUNE_NUM_WARP_K_PARTITIONS", 1]()), 1, False, PDLLevel())

Implemented traits​

comptime members​

__copyinit__is_trivial​

__del__is_trivial​

__moveinit__is_trivial​

ampere_128x128_4​

ampere_256x128_3​

ampere_256x64_4​

hopper_128x128_4​

tuning_config​