Mojo struct
TMEMToSMemWriter
struct TMEMToSMemWriter[c_type: DType, accum_type: DType, c_smem_dim0: Int, c_smem_dim1: Int, epc: EpilogueConfig, num_output_warps: Int, c_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B]
Write TMEM accumulators to SMEM via st.matrix (SM100-specific).
Fieldsβ
- βwarp_id (
UInt32): - βlane_id (
UInt32):
Implemented traitsβ
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
RegisterPassable,
TrivialRegisterPassable
comptime membersβ
BMβ
comptime BM = epc.BM
c_smem_layoutβ
comptime c_smem_layout = Layout.row_major(c_smem_dim0, c_smem_dim1)
Configβ
comptime Config = epc
cta_groupβ
comptime cta_group = epc.cta_group
data_pathsβ
comptime data_paths = 16
stage_contiguous_sizeβ
comptime stage_contiguous_size = c_smem_dim1
stageNβ
comptime stageN = epc.stageN
swizzleβ
comptime swizzle = make_swizzle[c_type, c_swizzle]()
swizzle_widthβ
comptime swizzle_width = (c_swizzle.bytes() // size_of[c_type]())
transpose_cβ
comptime transpose_c = epc.transpose_c
Methodsβ
__init__β
__init__(warp_id: UInt32, lane_id: UInt32) -> Self
write_fragmentsβ
write_fragments[repeat: Int](self, upper_frag: InlineArray[Scalar[c_type], (TMEMToSMemWriter[c_type, accum_type, c_smem_dim0, c_smem_dim1, epc, num_output_warps, c_swizzle].Config * repeat)], lower_frag: InlineArray[Scalar[c_type], (TMEMToSMemWriter[c_type, accum_type, c_smem_dim0, c_smem_dim1, epc, num_output_warps, c_swizzle].Config * repeat)], c_smem_tile: TileTensor[address_space=AddressSpace.SHARED, linear_idx_type=c_smem_tile.linear_idx_type, element_size=c_smem_tile.element_size])
Write pre-loaded fragments to SMEM.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!