Mojo function

copy_sram_to_local

copy_sram_to_local[src_warp_layout: Layout, axis: Optional[Int] = None](dst: LayoutTensor[dst.dtype, dst.layout, dst.origin, address_space=dst.address_space, element_layout=dst.element_layout, layout_int_type=dst.layout_int_type, linear_idx_type=dst.linear_idx_type, masked=dst.masked, alignment=dst.alignment], src: LayoutTensor[src.dtype, src.layout, src.origin, address_space=src.address_space, element_layout=src.element_layout, layout_int_type=src.layout_int_type, linear_idx_type=src.linear_idx_type, masked=src.masked, alignment=src.alignment])

Synchronously copy data from SRAM (shared memory) to local memory.

This function performs a synchronous memory transfer from SRAM (shared memory) to local memory (registers) using the specified thread layout for workload distribution.

Performance:

Distributes the copy workload across multiple threads for parallel execution.
Optimized for transferring data from shared memory to registers.
Supports optional axis-specific distribution for specialized access patterns.

Constraints:

The source tensor must be in SHARED address space (SRAM).
The destination tensor must be in LOCAL address space (registers).
Both tensors must have the same data type.

Parameters:

src_warp_layout (Layout): Layout defining how threads are organized for the source tensor. This determines how the workload is distributed among threads.
axis (Optional): Optional parameter specifying which axis to distribute along. When provided, distribution happens along the specified axis. When None (default), distribution uses the standard layout pattern.

Args:

dst (LayoutTensor): The destination tensor, which must be in local memory (registers).
src (LayoutTensor): The source tensor, which must be in shared memory (SRAM).