Mojo struct
TMEMToSMemWriter
struct TMEMToSMemWriter[c_type: DType, accum_type: DType, c_smem_dim0: Int, c_smem_dim1: Int, epc: EpilogueConfig, num_output_warps: Int, c_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B]
Write TMEM accumulators to SMEM via st.matrix (SM100-specific).
Fields
- warp_id (
UInt32): - lane_id (
UInt32):
Implemented traits
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
RegisterPassable,
TrivialRegisterPassable
comptime members
BM
comptime BM = epc.BM
c_smem_layout
comptime c_smem_layout = Layout.row_major(c_smem_dim0, c_smem_dim1)
Config
comptime Config = epc
cta_group
comptime cta_group = epc.cta_group
data_paths
comptime data_paths = 16
stage_contiguous_size
comptime stage_contiguous_size = c_smem_dim1
stageN
comptime stageN = epc.stageN
swizzle
comptime swizzle = make_swizzle[c_type, c_swizzle]()
swizzle_width
comptime swizzle_width = (c_swizzle.bytes() // size_of[c_type]())
transpose_c
comptime transpose_c = epc.transpose_c
Methods
__init__
__init__(warp_id: UInt32, lane_id: UInt32) -> Self
write_fragments
write_fragments[repeat: Int](self, upper_frag: InlineArray[Scalar[c_type], (TMEMToSMemWriter[c_type, accum_type, c_smem_dim0, c_smem_dim1, epc, num_output_warps, c_swizzle].Config * repeat)], lower_frag: InlineArray[Scalar[c_type], (TMEMToSMemWriter[c_type, accum_type, c_smem_dim0, c_smem_dim1, epc, num_output_warps, c_swizzle].Config * repeat)], c_smem_tile: TileTensor[c_smem_tile.dtype, c_smem_tile.LayoutType, c_smem_tile.origin, address_space=AddressSpace.SHARED, linear_idx_type=c_smem_tile.linear_idx_type, element_size=c_smem_tile.element_size])
Write pre-loaded fragments to SMEM.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!