Mojo struct

TMEMToSMemWriter

@register_passable(trivial) struct TMEMToSMemWriter[c_type: DType, accum_type: DType, c_smem_layout: Layout, BM: Int, BN: Int, MMA_M: Int, MMA_N: Int, stageN: Int, cta_group: Int, num_output_warps: Int, c_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, transpose_c: Bool = False]

Write TMEM accumulators to SMEM via st.matrix (SM100-specific).

Fields

warp_id (UInt32):
lane_id (UInt32):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable

`comptime` members

`copyinitis_trivial`

comptime __copyinit__is_trivial = True

`delis_trivial`

comptime __del__is_trivial = True

`moveinitis_trivial`

comptime __moveinit__is_trivial = True

`Config`

comptime Config = EpilogueConfig[MMA_M, MMA_N, stageN, cta_group, transpose_c]

`data_paths`

comptime data_paths = 16

`stage_contiguous_size`

comptime stage_contiguous_size = c_smem_layout.shape[1].value()

`swizzle`

comptime swizzle = make_swizzle[c_type, c_swizzle]()

`swizzle_width`

comptime swizzle_width = (c_swizzle.bytes() // size_of[c_type]())

Methods

`init`

__init__(warp_id: UInt32, lane_id: UInt32) -> Self

`write_fragments`

write_fragments[repeat: Int](self, upper_frag: SIMD[c_type, (4 * repeat)], lower_frag: SIMD[c_type, (4 * repeat)], c_smem_tile: LayoutTensor[c_type, c_smem_layout, MutAnyOrigin, address_space=AddressSpace.SHARED, alignment=128])

Write pre-loaded fragments to SMEM (use after register-based epilogue).

Fields​

Implemented traits​

comptime members​

__copyinit__is_trivial​

__del__is_trivial​

__moveinit__is_trivial​

Config​

data_paths​

stage_contiguous_size​

swizzle​

swizzle_width​

Methods​

__init__​

write_fragments​