Mojo function
shared_memory_epilogue_transpose
shared_memory_epilogue_transpose[stage: UInt, stageN: UInt, c_type: DType, c_smem_layout: Layout, swizzle: Swizzle, compute_lambda_fn: elementwise_compute_lambda_type, num_output_warps: UInt, warp_dim: UInt, MMA_M: Int, BN: Int, cta_group: Int](M: UInt32, N: UInt32, c_col: UInt, c_row: UInt, c_smem: LayoutTensor[c_type, c_smem_layout, MutAnyOrigin, address_space=AddressSpace.SHARED, alignment=128], warp_i: UInt, warp_j: UInt)
Apply element-wise epilogue to transposed shared memory tile.
Handles the transpose_c case for SMEM-based epilogue. Supports two warp configurations based on warp_dim parameter.
Template Parameters: stage: Current output stage index. stageN: Stage width in elements. c_type: Output data type. c_smem_layout: Shared memory tile layout. swizzle: Swizzle pattern for SMEM access. compute_lambda_fn: Element-wise compute function. num_output_warps: Number of warps participating. warp_dim: Warp dimension configuration (1 or 2). MMA_M: MMA M dimension. BN: Block N dimension. cta_group: Number of CTAs cooperating.
Args:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!