Mojo struct
MmaOpSM100_BlockScaled_SS
struct MmaOpSM100_BlockScaled_SS[c_type: DType, a_type: DType, b_type: DType, sfa_dtype: DType, sfb_dtype: DType, scaling_kind: UMMAKind, block_tile_shape: IndexList[3], mma_shape: IndexList[3], /, *, accum_type: DType = DType.float32, cta_group: Int = 1, cluster_shape: IndexList[3] = Index(1, 1, 1), a_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, b_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, transpose_b: Bool = False]
Fields
- idesc (
UMMAInsDescriptor[scaling_kind]): - mask (
UInt16):
Implemented traits
AnyType,
Copyable,
Defaultable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
RegisterPassable,
TrivialRegisterPassable
Methods
__init__
__init__() -> Self
mma
mma(self, a: LayoutTensor[a.dtype, a.layout, a.origin, address_space=AddressSpace.SHARED, element_layout=a.element_layout, layout_int_type=a.layout_int_type, linear_idx_type=a.linear_idx_type, masked=a.masked, alignment=a.alignment], b: LayoutTensor[b.dtype, b.layout, b.origin, address_space=AddressSpace.SHARED, element_layout=b.element_layout, layout_int_type=b.layout_int_type, linear_idx_type=b.linear_idx_type, masked=b.masked, alignment=b.alignment], sfa_smem: LayoutTensor[sfa_smem.dtype, sfa_smem.layout, sfa_smem.origin, address_space=AddressSpace.SHARED, element_layout=sfa_smem.element_layout, layout_int_type=sfa_smem.layout_int_type, linear_idx_type=sfa_smem.linear_idx_type, masked=sfa_smem.masked, alignment=sfa_smem.alignment], sfb_smem: LayoutTensor[sfb_smem.dtype, sfb_smem.layout, sfb_smem.origin, address_space=AddressSpace.SHARED, element_layout=sfb_smem.element_layout, layout_int_type=sfb_smem.layout_int_type, linear_idx_type=sfb_smem.linear_idx_type, masked=sfb_smem.masked, alignment=sfb_smem.alignment], c_tmem: UInt32, sfa_tmem: UInt32, sfb_tmem: UInt32, init_c: Bool, work_tile_coord: Tuple[UInt, UInt])
MMA input tiles.
The layout assumes that coalesce(A) has shape (bm, sw_k, num_sw_k), we currently assumes bm = mma_m. In future, we can tile it to (mma_m, sw_k, num_sw_k, num_mma_m) The same logic applies to matrix B.
mma(self, a: TileTensor[a.dtype, a.LayoutType, a.origin, address_space=AddressSpace.SHARED, linear_idx_type=a.linear_idx_type, element_size=a.element_size], b: TileTensor[b.dtype, b.LayoutType, b.origin, address_space=AddressSpace.SHARED, linear_idx_type=b.linear_idx_type, element_size=b.element_size], sfa_smem: TileTensor[sfa_smem.dtype, sfa_smem.LayoutType, sfa_smem.origin, address_space=AddressSpace.SHARED, linear_idx_type=sfa_smem.linear_idx_type, element_size=sfa_smem.element_size], sfb_smem: TileTensor[sfb_smem.dtype, sfb_smem.LayoutType, sfb_smem.origin, address_space=AddressSpace.SHARED, linear_idx_type=sfb_smem.linear_idx_type, element_size=sfb_smem.element_size], c_tmem: UInt32, sfa_tmem: UInt32, sfb_tmem: UInt32, init_c: Bool, sfb_tmem_adj: UInt32 = 0)
TileTensor overload for block-scaled MMA input tiles.
This overload accepts TileTensor directly for A, B, and scale factor tiles. The layout is extracted from TileTensor's compile-time type parameters (shape_types, stride_types) using direct VariadicType extraction for fast compile times.
commit
commit(self, ptr_mbar: UnsafePointer[ptr_mbar.type, ptr_mbar.origin, address_space=AddressSpace.SHARED])
wait
wait(self)
copy_sf_to_tmem
copy_sf_to_tmem[sf_dtype: DType, sf_smem_layout: Layout, TILE_MN: Int, tile_k_idx: Int](self, sf_smem: LayoutTensor[sf_smem.dtype, sf_smem.layout, sf_smem.origin, address_space=AddressSpace.SHARED, element_layout=sf_smem.element_layout, layout_int_type=sf_smem.layout_int_type, linear_idx_type=sf_smem.linear_idx_type, masked=sf_smem.masked, alignment=sf_smem.alignment], sf_tmem: UInt32)
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!