Skip to main content

Mojo struct

MmaOpSM100_SS

@register_passable(trivial) struct MmaOpSM100_SS[c_type: DType, a_type: DType, b_type: DType, block_tile_shape: IndexList[3], mma_shape: IndexList[3], /, *, accum_type: DType = DType.float32, cta_group: Int = 1, cluster_shape: IndexList[3] = Index(1, 1, 1), a_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, b_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, transpose_b: Bool = False]

Fields

  • idesc (UMMAInsDescriptor[MmaOpSM100_SS._get_umma_kind[a_type]()]):
  • mask (UInt16):

Implemented traits

AnyType, Copyable, Defaultable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, RegisterPassable, TrivialRegisterPassable

comptime members

__copy_ctor_is_trivial

comptime __copy_ctor_is_trivial = True

__del__is_trivial

comptime __del__is_trivial = True

__move_ctor_is_trivial

comptime __move_ctor_is_trivial = True

Methods

__init__

__init__() -> Self

mma

mma(self, a: LayoutTensor[a.dtype, a.layout, a.origin, address_space=AddressSpace.SHARED, element_layout=a.element_layout, layout_int_type=a.layout_int_type, linear_idx_type=a.linear_idx_type, masked=a.masked, alignment=a.alignment], b: LayoutTensor[b.dtype, b.layout, b.origin, address_space=AddressSpace.SHARED, element_layout=b.element_layout, layout_int_type=b.layout_int_type, linear_idx_type=b.linear_idx_type, masked=b.masked, alignment=b.alignment], c_tmem: UInt32, init_c: Bool)

MMA input tiles.

The layout assumes that coalesce(A) has shape (bm, sw_k, num_sw_k), we currently assumes bm = mma_m. In future, we can tile it to (mma_m, sw_k, num_sw_k, num_mma_m) The same logic applies to matrix B.

mma(self, a: TileTensor[a.dtype, a.LayoutType, a.origin, address_space=AddressSpace.SHARED, linear_idx_type=a.linear_idx_type, element_shape_types=a.element_shape_types], b: TileTensor[b.dtype, b.LayoutType, b.origin, address_space=AddressSpace.SHARED, linear_idx_type=b.linear_idx_type, element_shape_types=b.element_shape_types], c_tmem: UInt32, init_c: Bool)

TileTensor overload for MMA input tiles.

This overload accepts TileTensor directly. The layout is extracted from TileTensor's compile-time type parameters (shape_types, stride_types).

commit

commit(self, ptr_mbar: LegacyUnsafePointer[ptr_mbar.type, address_space=AddressSpace.SHARED, _mlir_origin=ptr_mbar.origin._mlir_origin, origin=ptr_mbar.origin])

wait

wait(self)

Was this page helpful?