For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo struct
RegTileWriterLDS
struct RegTileWriterLDS[thread_layout: Layout[thread_layout.shape_types, thread_layout.stride_types], swizzle: Optional[Swizzle] = None, num_threads: Int = thread_layout.size()]
Stateless register→LDS copy expert.
Sibling to RegTileLoader / RegTileWriter (DRAM↔reg) and
TileLoaderLDS / SubTileLoaderLDS (DRAM→LDS). Writes register
tiles to shared memory via thread-distributed element stores.
Static methods:
copy - Standard plain-SMEM write (rank-2 or rank-3
distributed layouts); reads src in row-major
order to match RegTileLoader's storage.
copy_blocked - blocked_product SMEM write with its own
block_cols param. Used when thread_layout and
SMEM layout have mismatched blocked structure
that distribute_with_offset can't resolve.
Parameters
- thread_layout (
Layout[thread_layout.shape_types, thread_layout.stride_types]): Thread distribution layout across the tile. - swizzle (
Optional[Swizzle]): Optional SMEM swizzle for bank-conflict avoidance. - num_threads (
Int): Number of threads to participate (threads pastthread_layout.size()early-exit).
Implemented traits
AnyType,
ImplicitlyDestructible
Methods
copy
static copy(dst: TileTensor[address_space=AddressSpace.SHARED, linear_idx_type=dst.linear_idx_type, element_size=dst.element_size], src: TileTensor[address_space=AddressSpace.LOCAL, linear_idx_type=src.linear_idx_type, element_size=src.element_size])
Copy register data to SMEM, distributed across threads.
Reads src registers in row-major element order to match the
storage convention of RegTileLoader. Supports both flat
(rank 2) and hierarchical (rank 3) distributed layouts.
Args:
- dst (
TileTensor[address_space=AddressSpace.SHARED, linear_idx_type=dst.linear_idx_type, element_size=dst.element_size]): Destination TileTensor in shared memory. - src (
TileTensor[address_space=AddressSpace.LOCAL, linear_idx_type=src.linear_idx_type, element_size=src.element_size]): Source TileTensor in local (register) memory.
copy_blocked
static copy_blocked[block_cols: Int](dst: TileTensor[address_space=AddressSpace.SHARED], src: TileTensor[dst.dtype, address_space=AddressSpace.LOCAL])
Copy register tile to blocked_product SMEM layout.
Handles structural mismatches between thread_layout and SMEM
layout by computing per-element SMEM offsets using the
blocked_product formula. Reads registers sequentially as
simd_width-wide vectors; this is invariant to col- vs row-major
flat ordering when each per-thread row equals one SIMD vector.
The SMEM layout is blocked_product with blocks of
dst.shape[0] x block_cols. thread_layout distributes a 2D
grid of (data_rows, data_cols/simd_width) vector positions
across threads.
Parameters:
- block_cols (
Int): Cols per SMEM block inblocked_productlayout.
Args:
- dst (
TileTensor[address_space=AddressSpace.SHARED]): Destination[block_rows, data_cols]in SHARED. - src (
TileTensor[dst.dtype, address_space=AddressSpace.LOCAL]): Source register tile in LOCAL (row-major elements).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!