Skip to main content

Mojo struct

TMEMToSMemWriter

struct TMEMToSMemWriter[c_type: DType, accum_type: DType, c_smem_dim0: Int, c_smem_dim1: Int, epc: EpilogueConfig, num_output_warps: Int, c_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B]

Write TMEM accumulators to SMEM via st.matrix (SM100-specific).

Fields​

  • ​warp_id (UInt32):
  • ​lane_id (UInt32):

Implemented traits​

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, RegisterPassable, TrivialRegisterPassable

comptime members​

BM​

comptime BM = epc.BM

c_smem_layout​

comptime c_smem_layout = Layout.row_major(c_smem_dim0, c_smem_dim1)

Config​

comptime Config = epc

cta_group​

comptime cta_group = epc.cta_group

data_paths​

comptime data_paths = 16

stage_contiguous_size​

comptime stage_contiguous_size = c_smem_dim1

stageN​

comptime stageN = epc.stageN

swizzle​

comptime swizzle = make_swizzle[c_type, c_swizzle]()

swizzle_width​

comptime swizzle_width = (c_swizzle.bytes() // size_of[c_type]())

transpose_c​

comptime transpose_c = epc.transpose_c

Methods​

__init__​

__init__(warp_id: UInt32, lane_id: UInt32) -> Self

write_fragments​

write_fragments[repeat: Int](self, upper_frag: InlineArray[Scalar[c_type], (TMEMToSMemWriter[c_type, accum_type, c_smem_dim0, c_smem_dim1, epc, num_output_warps, c_swizzle].Config * repeat)], lower_frag: InlineArray[Scalar[c_type], (TMEMToSMemWriter[c_type, accum_type, c_smem_dim0, c_smem_dim1, epc, num_output_warps, c_swizzle].Config * repeat)], c_smem_tile: TileTensor[address_space=AddressSpace.SHARED, linear_idx_type=c_smem_tile.linear_idx_type, element_size=c_smem_tile.element_size])

Write pre-loaded fragments to SMEM.