For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

MmaOpSM100_SS

struct MmaOpSM100_SS[c_type: DType, a_type: DType, b_type: DType, block_tile_shape: IndexList[Int(3)], mma_shape: IndexList[Int(3)], /, *, accum_type: DType = DType.float32, cta_group: Int = Int(1), cluster_shape: IndexList[Int(3)] = Index[Int, Int, Int](Int(1), Int(1), Int(1)), a_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, b_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, transpose_b: Bool = False]

Fields

idesc (UMMAInsDescriptor[MmaOpSM100_SS._get_umma_kind[a_type]()]):
mask (UInt16):

Implemented traits

AnyType, Copyable, Defaultable, ImplicitlyCopyable, ImplicitlyDeletable, Movable, RegisterPassable, TrivialRegisterPassable

Methods

`init`

def __init__() -> Self

`mma`

def mma(self, a: TileTensor[Storage=a.Storage, address_space=AddressSpace.SHARED, linear_idx_type=a.linear_idx_type, element_size=a.element_size], b: TileTensor[Storage=b.Storage, address_space=AddressSpace.SHARED, linear_idx_type=b.linear_idx_type, element_size=b.element_size], c_tmem: UInt32, init_c: Bool)

Issue MMA operations over K tiles from shared memory to TMEM.

Args:

a (TileTensor[Storage=a.Storage, address_space=AddressSpace.SHARED, linear_idx_type=a.linear_idx_type, element_size=a.element_size]): A operand tile in shared memory.
b (TileTensor[Storage=b.Storage, address_space=AddressSpace.SHARED, linear_idx_type=b.linear_idx_type, element_size=b.element_size]): B operand tile in shared memory.
c_tmem (UInt32): TMEM address for the accumulator.
init_c (Bool): When True, zero-initialize the accumulator on the first K slice instead of accumulating.

`commit`

def commit(self, ptr_mbar: UnsafePointer[address_space=AddressSpace.SHARED])

`wait`

def wait(self)

Fields​

Implemented traits​

Methods​

__init__​

mma​

commit​

wait​