Mojo struct

TMEMToSMemWriter

struct TMEMToSMemWriter[c_type: DType, accum_type: DType, c_smem_dim0: Int, c_smem_dim1: Int, epc: EpilogueConfig, num_output_warps: Int, c_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B]

Write TMEM accumulators to SMEM via st.matrix (SM100-specific).

Fields

warp_id (UInt32):
lane_id (UInt32):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, RegisterPassable, TrivialRegisterPassable

`comptime` members

`BM`

comptime BM = epc.BM

`c_smem_layout`

comptime c_smem_layout = Layout.row_major(c_smem_dim0, c_smem_dim1)

`Config`

comptime Config = epc

`cta_group`

comptime cta_group = epc.cta_group

`data_paths`

comptime data_paths = 16

`stage_contiguous_size`

comptime stage_contiguous_size = c_smem_dim1

`stageN`

comptime stageN = epc.stageN

`swizzle`

comptime swizzle = make_swizzle[c_type, c_swizzle]()

`swizzle_width`

comptime swizzle_width = (c_swizzle.bytes() // size_of[c_type]())

`transpose_c`

comptime transpose_c = epc.transpose_c

Methods

`init`

__init__(warp_id: UInt32, lane_id: UInt32) -> Self

`write_fragments`

write_fragments[repeat: Int](self, upper_frag: InlineArray[Scalar[c_type], (TMEMToSMemWriter[c_type, accum_type, c_smem_dim0, c_smem_dim1, epc, num_output_warps, c_swizzle].Config * repeat)], lower_frag: InlineArray[Scalar[c_type], (TMEMToSMemWriter[c_type, accum_type, c_smem_dim0, c_smem_dim1, epc, num_output_warps, c_swizzle].Config * repeat)], c_smem_tile: TileTensor[c_smem_tile.dtype, c_smem_tile.LayoutType, c_smem_tile.origin, address_space=AddressSpace.SHARED, linear_idx_type=c_smem_tile.linear_idx_type, element_size=c_smem_tile.element_size])

Write pre-loaded fragments to SMEM.

Fields​

Implemented traits​

comptime members​

BM​

c_smem_layout​

Config​

cta_group​

data_paths​

stage_contiguous_size​

stageN​

swizzle​

swizzle_width​

transpose_c​

Methods​

__init__​

write_fragments​