Mojo function
apply_epilogue_to_output_tile
apply_epilogue_to_output_tile[c_type: DType, c_tile_layout: Layout, //, elementwise_lambda_fn: fn[dtype: DType, width: Int, *, alignment: Int = 1](IndexList[2], mut SIMD[dtype, width]) capturing -> None, N: Int, WG_BN: Int, num_consumer_threads: Int, simd_size: Int](c_tile: LayoutTensor[c_type, c_tile_layout, MutableAnyOrigin, address_space=AddressSpace(3), alignment=128], c_gmem_wg_tile: LayoutTensor[c_type, layout, MutableAnyOrigin, address_space=address_space, element_layout=element_layout, layout_int_type=layout_int_type, linear_idx_type=linear_idx_type, masked=masked, alignment=alignment], c_gmem_wg_coord_m: Int, c_gmem_wg_coord_n: Int, local_thread_idx: UInt, M_bound: UInt32, N_bound: UInt32)
Apply the epilogue lambda function to the output data.
This function reads data from shared memory, applies the user-provided epilogue function (e.g., bias addition, activation), and handles bounds checking for edge tiles.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!