Skip to main content

Mojo struct

TMEMToSMemWriter

struct TMEMToSMemWriter[c_type: DType, accum_type: DType, c_smem_dim0: Int, c_smem_dim1: Int, epc: EpilogueConfig, num_output_warps: Int, c_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B]

Write TMEM accumulators to SMEM via st.matrix (SM100-specific).

Fields

  • warp_id (UInt32):
  • lane_id (UInt32):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, RegisterPassable, TrivialRegisterPassable

comptime members

BM

comptime BM = epc.BM

c_smem_layout

comptime c_smem_layout = Layout.row_major(c_smem_dim0, c_smem_dim1)

Config

comptime Config = epc

cta_group

comptime cta_group = epc.cta_group

data_paths

comptime data_paths = 16

stage_contiguous_size

comptime stage_contiguous_size = c_smem_dim1

stageN

comptime stageN = epc.stageN

swizzle

comptime swizzle = make_swizzle[c_type, c_swizzle]()

swizzle_width

comptime swizzle_width = (c_swizzle.bytes() // size_of[c_type]())

transpose_c

comptime transpose_c = epc.transpose_c

Methods

__init__

__init__(warp_id: UInt32, lane_id: UInt32) -> Self

write_fragments

write_fragments[repeat: Int](self, upper_frag: InlineArray[Scalar[c_type], (TMEMToSMemWriter[c_type, accum_type, c_smem_dim0, c_smem_dim1, epc, num_output_warps, c_swizzle].Config * repeat)], lower_frag: InlineArray[Scalar[c_type], (TMEMToSMemWriter[c_type, accum_type, c_smem_dim0, c_smem_dim1, epc, num_output_warps, c_swizzle].Config * repeat)], c_smem_tile: TileTensor[c_smem_tile.dtype, c_smem_tile.LayoutType, c_smem_tile.origin, address_space=AddressSpace.SHARED, linear_idx_type=c_smem_tile.linear_idx_type, element_size=c_smem_tile.element_size])

Write pre-loaded fragments to SMEM.

Was this page helpful?