Skip to main content

Mojo function

shared_memory_epilogue_transpose

shared_memory_epilogue_transpose[stage: Int, stageN: Int, c_type: DType, c_smem_layout: Layout, swizzle: Swizzle, compute_lambda_fn: def[dtype: DType, width: Int, *, alignment: Int = 1](IndexList[2], SIMD[dtype, width]) capturing -> SIMD[dtype, width], num_output_warps: Int, warp_dim: Int, MMA_M: Int, BN: Int, cta_group: Int](M: UInt32, N: UInt32, c_col: Int, c_row: Int, c_smem: TileTensor[c_type, c_smem.LayoutType, c_smem.origin, address_space=AddressSpace.SHARED, linear_idx_type=c_smem.linear_idx_type, element_size=c_smem.element_size], warp_i: Int, warp_j: Int)

Apply element-wise epilogue to transposed SMEM tile.

Supports warp_dim=1 (stageN, warp_i, U) or warp_dim=2 (warp_j, stageN, warp_i, UL).

Was this page helpful?