Mojo function
copy_sram_to_local
copy_sram_to_local[src_warp_layout: Layout, axis: Optional[Int] = None](dst: LayoutTensor[dst.dtype, dst.layout, dst.origin, address_space=dst.address_space, element_layout=dst.element_layout, layout_int_type=dst.layout_int_type, linear_idx_type=dst.linear_idx_type, masked=dst.masked, alignment=dst.alignment], src: LayoutTensor[src.dtype, src.layout, src.origin, address_space=src.address_space, element_layout=src.element_layout, layout_int_type=src.layout_int_type, linear_idx_type=src.linear_idx_type, masked=src.masked, alignment=src.alignment])
Synchronously copy data from SRAM (shared memory) to local memory.
This function performs a synchronous memory transfer from SRAM (shared memory) to local memory (registers) using the specified thread layout for workload distribution.
Performance:
- Distributes the copy workload across multiple threads for parallel execution.
- Optimized for transferring data from shared memory to registers.
- Supports optional axis-specific distribution for specialized access patterns.
Constraints:
- The source tensor must be in SHARED address space (SRAM).
- The destination tensor must be in LOCAL address space (registers).
- Both tensors must have the same data type.
Parameters:
- βsrc_warp_layout (
Layout): Layout defining how threads are organized for the source tensor. This determines how the workload is distributed among threads. - βaxis (
Optional): Optional parameter specifying which axis to distribute along. When provided, distribution happens along the specified axis. When None (default), distribution uses the standard layout pattern.
Args:
- βdst (
LayoutTensor): The destination tensor, which must be in local memory (registers). - βsrc (
LayoutTensor): The source tensor, which must be in shared memory (SRAM).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!