Skip to main content

Mojo struct

TMEMToSMemWriter

@register_passable(trivial) struct TMEMToSMemWriter[c_type: DType, accum_type: DType, c_smem_layout: Layout, BM: Int, BN: Int, MMA_M: Int, MMA_N: Int, stageN: Int, cta_group: Int, num_output_warps: Int, c_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, transpose_c: Bool = False]

Write TMEM accumulators to SMEM via st.matrix (SM100-specific).

Fields

  • warp_id (UInt32):
  • lane_id (UInt32):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable

comptime members

__copyinit__is_trivial

comptime __copyinit__is_trivial = True

__del__is_trivial

comptime __del__is_trivial = True

__moveinit__is_trivial

comptime __moveinit__is_trivial = True

Config

comptime Config = EpilogueConfig[MMA_M, MMA_N, stageN, cta_group, transpose_c]

data_paths

comptime data_paths = 16

stage_contiguous_size

comptime stage_contiguous_size = c_smem_layout.shape[1].value()

swizzle

comptime swizzle = make_swizzle[c_type, c_swizzle]()

swizzle_width

comptime swizzle_width = (c_swizzle.bytes() // size_of[c_type]())

Methods

__init__

__init__(warp_id: UInt32, lane_id: UInt32) -> Self

write_fragments

write_fragments[repeat: Int](self, upper_frag: SIMD[c_type, (4 * repeat)], lower_frag: SIMD[c_type, (4 * repeat)], c_smem_tile: LayoutTensor[c_type, c_smem_layout, MutAnyOrigin, address_space=AddressSpace.SHARED, alignment=128])

Write pre-loaded fragments to SMEM (use after register-based epilogue).

Was this page helpful?