Mojo struct
MmaOpSM100_SS
struct MmaOpSM100_SS[c_type: DType, a_type: DType, b_type: DType, block_tile_shape: IndexList[3], mma_shape: IndexList[3], /, *, accum_type: DType = DType.float32, cta_group: Int = 1, cluster_shape: IndexList[3] = Index[Int, Int, Int](1, 1, 1), a_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, b_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, transpose_b: Bool = False]
Fieldsβ
- βidesc (
UMMAInsDescriptor[MmaOpSM100_SS._get_umma_kind[a_type]()]): - βmask (
UInt16):
Implemented traitsβ
AnyType,
Copyable,
Defaultable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
RegisterPassable,
TrivialRegisterPassable
Methodsβ
__init__β
__init__() -> Self
mmaβ
mma(self, a: TileTensor[address_space=AddressSpace.SHARED, linear_idx_type=a.linear_idx_type, element_size=a.element_size], b: TileTensor[address_space=AddressSpace.SHARED, linear_idx_type=b.linear_idx_type, element_size=b.element_size], c_tmem: UInt32, init_c: Bool)
Issue MMA operations over K tiles from shared memory to TMEM.
Args:
- βa (
TileTensor[address_space=AddressSpace.SHARED, linear_idx_type=a.linear_idx_type, element_size=a.element_size]): A operand tile in shared memory. - βb (
TileTensor[address_space=AddressSpace.SHARED, linear_idx_type=b.linear_idx_type, element_size=b.element_size]): B operand tile in shared memory. - βc_tmem (
UInt32): TMEM address for the accumulator. - βinit_c (
Bool): When True, zero-initialize the accumulator on the first K slice instead of accumulating.
commitβ
commit(self, ptr_mbar: UnsafePointer[address_space=AddressSpace.SHARED])
waitβ
wait(self)
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!