Mojo struct

MmaOpSM100_BlockScaled_SS

@register_passable(trivial) struct MmaOpSM100_BlockScaled_SS[c_type: DType, a_type: DType, b_type: DType, sfa_dtype: DType, sfb_dtype: DType, scaling_kind: UMMAKind, block_tile_shape: IndexList[3], mma_shape: IndexList[3], /, *, accum_type: DType = DType.float32, cta_group: Int = 1, cluster_shape: IndexList[3] = Index(1, 1, 1), a_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, b_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, transpose_b: Bool = False]

Fields

idesc (UMMAInsDescriptor[scaling_kind]):
mask (UInt16):

Implemented traits

AnyType, Copyable, Defaultable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, RegisterPassable, TrivialRegisterPassable

`comptime` members

`copyinitis_trivial`

comptime __copyinit__is_trivial = True

`delis_trivial`

comptime __del__is_trivial = True

`moveinitis_trivial`

comptime __moveinit__is_trivial = True

Methods

`init`

__init__() -> Self

`mma`

mma(self, a: LayoutTensor[dtype, layout, origin, address_space=AddressSpace.SHARED, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], b: LayoutTensor[dtype, layout, origin, address_space=AddressSpace.SHARED, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], sfa_smem: LayoutTensor[dtype, layout, origin, address_space=AddressSpace.SHARED, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], sfb_smem: LayoutTensor[dtype, layout, origin, address_space=AddressSpace.SHARED, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], c_tmem: UInt32, sfa_tmem: UInt32, sfb_tmem: UInt32, init_c: Bool)

MMA input tiles.

The layout assumes that coalesce(A) has shape (bm, sw_k, num_sw_k), we currently assumes bm = mma_m. In future, we can tile it to (mma_m, sw_k, num_sw_k, num_mma_m) The same logic applies to matrix B.

mma(self, a: TileTensor[dtype, LayoutType, origin, address_space=AddressSpace.SHARED, linear_idx_type=linear_idx_type, element_shape_types=element_shape_types], b: TileTensor[dtype, LayoutType, origin, address_space=AddressSpace.SHARED, linear_idx_type=linear_idx_type, element_shape_types=element_shape_types], sfa_smem: TileTensor[dtype, LayoutType, origin, address_space=AddressSpace.SHARED, linear_idx_type=linear_idx_type, element_shape_types=element_shape_types], sfb_smem: TileTensor[dtype, LayoutType, origin, address_space=AddressSpace.SHARED, linear_idx_type=linear_idx_type, element_shape_types=element_shape_types], c_tmem: UInt32, sfa_tmem: UInt32, sfb_tmem: UInt32, init_c: Bool)

TileTensor overload for block-scaled MMA input tiles.

This overload accepts TileTensor directly for A, B, and scale factor tiles. The layout is extracted from TileTensor's compile-time type parameters (shape_types, stride_types) using direct VariadicType extraction for fast compile times.

`commit`

commit(self, ptr_mbar: LegacyUnsafePointer[type, address_space=AddressSpace.SHARED, origin=origin])

`wait`

wait(self)

`copy_sf_to_tmem`

copy_sf_to_tmem[sf_dtype: DType, sf_smem_layout: Layout, TILE_MN: Int, tile_k_idx: Int](self, sf_smem: LayoutTensor[dtype, layout, origin, address_space=AddressSpace.SHARED, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], sf_tmem: UInt32)

Fields​

Implemented traits​

comptime members​

__copyinit__is_trivial​

__del__is_trivial​

__moveinit__is_trivial​

Methods​

__init__​

mma​

commit​

wait​

copy_sf_to_tmem​