Skip to main content

Mojo struct

SharedToGenericTileCopier

struct SharedToGenericTileCopier[thread_layout: Layout[thread_layout.shape_types, thread_layout.stride_types], *, swizzle: Optional[Swizzle] = None, num_threads: Int = thread_layout.size()]

A TileCopier that moves a tile from shared memory into generic memory.

The swizzle parameter is a property of the shared-memory tile being read and must match the swizzle used when that tile was written; passing a mismatched (or None) swizzle produces incorrect data.

Parameters

  • thread_layout (Layout): Layout describing how threads are organized over the copy.
  • swizzle (Optional): Swizzle the shared-memory tile was populated with.
  • num_threads (Int): Total number of threads in the thread block. Threads beyond thread_layout.size() do not participate.

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, Movable, TileCopier

comptime members

dst_address_space

comptime dst_address_space = AddressSpace.GENERIC

Destination AddressSpace this copier writes to.

src_address_space

comptime src_address_space = AddressSpace.SHARED

Source AddressSpace this copier reads from.

Methods

copy

copy[element_size: Int](self, dst: TileTensor[dst.dtype, dst.LayoutType, dst.origin, linear_idx_type=dst.linear_idx_type, element_size=element_size], src: TileTensor[src.dtype, src.LayoutType, src.origin, address_space=SharedToGenericTileCopier[thread_layout, swizzle=swizzle, num_threads=num_threads].src_address_space, linear_idx_type=src.linear_idx_type, element_size=element_size])

Copies src in shared memory into dst in generic memory.

The non-swizzled path uses TileTensor.copy, which widens to SIMD stores when the layouts permit. The swizzled path walks per-thread elements explicitly and applies the swizzle to the source fragment offsets.

Masked bounds checking, fp32 -> half precision downcast, and binary_op fusion are not supported.

Parameters:

  • element_size (Int): Number of scalar elements per logical element.

Args:

  • dst (TileTensor): Destination tile in generic memory.
  • src (TileTensor): Source tile in shared memory.

Was this page helpful?