For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

MatmulConfig

struct MatmulConfig[a_type: DType, b_type: DType, c_type: DType, transpose_b: Bool = False]

Static configuration of GPU matmul.

Fields

block_tile_shape (IndexList[Int(3)]):
warp_tile_shape (IndexList[Int(3)]):
mma_shape (IndexList[Int(3)]):
num_pipeline_stages (Int):
num_k_partitions (Int):
k_group_size (Int):
num_warp_k_partitions (Int):
cluster_shape (IndexList[Int(3)]):
num_consumer (Int):
partitioned_multicast (Bool):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDeletable, Movable, RegisterPassable, TrivialRegisterPassable, Writable

`comptime` members

`ACCUM_PRECISION`

comptime ACCUM_PRECISION = 1

`accum_type`

comptime accum_type = get_accum_type[a_type]()

`OUTPUT_PRECISION`

comptime OUTPUT_PRECISION = 2

`split_k_reduction_scheme`

comptime split_k_reduction_scheme = get_defined_int[StringSlice("SPLITK_REDUCTION_SCHEME"), Int(2)]()

`split_k_reduction_type`

comptime split_k_reduction_type = c_type if (Int(2) == get_defined_int[StringSlice("SPLITK_REDUCTION_SCHEME"), Int(2)]()) else MatmulConfig[a_type, b_type, c_type, transpose_b].accum_type

Methods

`init`

def __init__(*, block_tile_shape: IndexList[Int(3)] = Index[Int, Int, Int](Int(128), Int(128), Int(32)), warp_tile_shape: IndexList[Int(3)] = Index[Int, Int, Int](Int(64), Int(64), Int(32)), mma_shape: IndexList[Int(3)] = get_mma_shape[a_type, MatmulConfig[a_type, b_type, c_type, transpose_b].accum_type](), cluster_shape: IndexList[Int(3)] = Index[Int, Int, Int](Int(1), Int(1), Int(1)), num_pipeline_stages: Int = Int(4), num_k_partitions: Int = Int(1), k_group_size: Int = Int(1), num_warp_k_partitions: Int = Int(1), num_consumer: Int = Int(1), partitioned_multicast: Bool = False, pdl_level: PDLLevel = PDLLevel()) -> Self

`eq`

def __eq__(self, rhs: MatmulConfig) -> Bool

Returns:

Args:

hasher (H): The hasher instance.

Fields​

Implemented traits​

comptime members​

ACCUM_PRECISION​

accum_type​

OUTPUT_PRECISION​

split_k_reduction_scheme​

split_k_reduction_type​

Methods​

__init__​

__eq__​

copy_field​

swapAB​

num_warps_m​

num_warps_n​

num_threads​

shared_mem_usage​

grid_dim​

block_dim​

work_space_size​

pdl_level​

write_to​

write_repr_to​

__hash__​

Fields

Implemented traits

`comptime` members

`ACCUM_PRECISION`

`accum_type`

`OUTPUT_PRECISION`

`split_k_reduction_scheme`

`split_k_reduction_type`

Methods

`init`

`eq`

`copy_field`

`swapAB`

`num_warps_m`

`num_warps_n`

`num_threads`

`shared_mem_usage`

`grid_dim`

`block_dim`

`work_space_size`

`pdl_level`

`write_to`

`write_repr_to`

`hash`