For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo function
async_copy_with_bound_check
def async_copy_with_bound_check[dtype: DType, src_layout: TensorLayout, dst_layout: TensorLayout, //, thread_layout: Layout[thread_layout.shape_types, thread_layout.stride_types], swizzle_mode: TensorMapSwizzle](src: TileTensor[dtype, src_layout, ImmutAnyOrigin, Storage=src.Storage, linear_idx_type=src.linear_idx_type, element_size=src.element_size], dst: TileTensor[dtype, dst_layout, MutAnyOrigin, Storage=dst.Storage, address_space=AddressSpace.SHARED, linear_idx_type=dst.linear_idx_type, element_size=dst.element_size])
Helper function for cp.async with boundary checking.
This method performs element-wise async copies with per-element boundary checking. Out-of-bounds accesses are automatically zero-filled, ensuring safe operation near matrix edges.
The method also handles shared memory swizzling to avoid bank conflicts and maximize memory bandwidth utilization.
Template Parameters: dtype: Data type of the elements. src_layout: Layout of the source tile. dst_layout: Layout of the destination tile. thread_layout: Thread arrangement for distributed copying. swizzle_mode: Swizzling pattern for bank conflict avoidance.
Args:
- src (
TileTensor[dtype, src_layout, ImmutAnyOrigin, Storage=src.Storage, linear_idx_type=src.linear_idx_type, element_size=src.element_size]): Source tensor fragment in global memory. - dst (
TileTensor[dtype, dst_layout, MutAnyOrigin, Storage=dst.Storage, address_space=AddressSpace.SHARED, linear_idx_type=dst.linear_idx_type, element_size=dst.element_size]): Destination tensor fragment in shared memory.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!