Mojo function
copy_accum_to_gmem
copy_accum_to_gmem[c_type: DType, c_layout: Layout, c_smem_layout: Layout, c_desc_layout: Layout, num_accum_pipeline_stages: Int, num_output_stages: Int, /, *, repeat: Int, accum_type: DType, cta_group: Int, epilogue_dtype: DType, block_tile_shape: IndexList[3], mma_shape: IndexList[3], num_output_warps: UInt, c_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_128B, elementwise_compute_lambda_fn: OptionalReg[fn[dtype: DType, width: Int, *, alignment: Int = 1](IndexList[2], SIMD[dtype, width]) capturing -> SIMD[dtype, width]] = None, register_based_epilogue: Bool = True, transpose_c: Bool = False](c_tiles: SMemTileArrayType[c_type, c_smem_layout, num_output_stages, 128], c_tma_op: TMATensorTile[c_type, c_layout, c_desc_layout], output_stage: OutputStage[num_accum_pipeline_stages], c_coord: Tuple[UInt32, UInt32], c_shape: Tuple[UInt32, UInt32])
Epilogue pipeline: TMEM → Registers → SMEM → GMEM (via TMA).
Args:
- c_tiles (
SMemTileArrayType): Shared memory tiles for output staging. - c_tma_op (
TMATensorTile): TMA descriptor for C matrix. - output_stage (
OutputStage): Self-contained stage with pipeline, stage index, and TMEM offset. - c_coord (
Tuple): (M, N) tile coordinates. - c_shape (
Tuple): (M, N) matrix dimensions.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!