Skip to main content

Mojo function

shared_memory_epilogue_transpose

shared_memory_epilogue_transpose[stage: UInt, stageN: UInt, c_type: DType, c_smem_layout: Layout, swizzle: Swizzle, compute_lambda_fn: elementwise_compute_lambda_type, num_output_warps: UInt, warp_dim: UInt, MMA_M: Int, BN: Int, cta_group: Int](M: UInt32, N: UInt32, c_col: UInt, c_row: UInt, c_smem: LayoutTensor[c_type, c_smem_layout, MutAnyOrigin, address_space=AddressSpace.SHARED, alignment=128], warp_i: UInt, warp_j: UInt)

Apply element-wise epilogue to transposed shared memory tile.

Handles the transpose_c case for SMEM-based epilogue. Supports two warp configurations based on warp_dim parameter.

Template Parameters: stage: Current output stage index. stageN: Stage width in elements. c_type: Output data type. c_smem_layout: Shared memory tile layout. swizzle: Swizzle pattern for SMEM access. compute_lambda_fn: Element-wise compute function. num_output_warps: Number of warps participating. warp_dim: Warp dimension configuration (1 or 2). MMA_M: MMA M dimension. BN: Block N dimension. cta_group: Number of CTAs cooperating.

Args:

  • M (UInt32): Output M dimension.
  • N (UInt32): Output N dimension.
  • c_col (UInt): Base column coordinate.
  • c_row (UInt): Base row coordinate.
  • c_smem (LayoutTensor): Shared memory tile.
  • warp_i (UInt): Warp index i.
  • warp_j (UInt): Warp index j.

Was this page helpful?