Mojo struct
TMEMToSMemWriter
@register_passable(trivial)
struct TMEMToSMemWriter[c_type: DType, accum_type: DType, c_smem_layout: Layout, BM: Int, BN: Int, MMA_M: Int, MMA_N: Int, stageN: Int, cta_group: Int, num_output_warps: Int, c_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, transpose_c: Bool = False]
Write TMEM accumulators to SMEM via st.matrix (SM100-specific).
Fields
- warp_id (
UInt32): - lane_id (
UInt32):
Implemented traits
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable
comptime members
__copyinit__is_trivial
comptime __copyinit__is_trivial = True
__del__is_trivial
comptime __del__is_trivial = True
__moveinit__is_trivial
comptime __moveinit__is_trivial = True
Config
comptime Config = EpilogueConfig[MMA_M, MMA_N, stageN, cta_group, transpose_c]
data_paths
comptime data_paths = 16
stage_contiguous_size
comptime stage_contiguous_size = c_smem_layout.shape[1].value()
swizzle
comptime swizzle = make_swizzle[c_type, c_swizzle]()
swizzle_width
comptime swizzle_width = (c_swizzle.bytes() // size_of[c_type]())
Methods
__init__
__init__(warp_id: UInt32, lane_id: UInt32) -> Self
write_fragments
write_fragments[repeat: Int](self, upper_frag: SIMD[c_type, (4 * repeat)], lower_frag: SIMD[c_type, (4 * repeat)], c_smem_tile: LayoutTensor[c_type, c_smem_layout, MutAnyOrigin, address_space=AddressSpace.SHARED, alignment=128])
Write pre-loaded fragments to SMEM (use after register-based epilogue).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!