For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

load_lds_fragment

def load_lds_fragment[smem_layout: TensorLayout, reg_layout: TensorLayout, //, MMA_K: Int, swizzle: Optional[Swizzle] = Optional()](smem_tile: TileTensor[smem_layout, address_space=AddressSpace.SHARED], reg_tile: TileTensor[smem_tile.dtype, reg_layout, address_space=AddressSpace.LOCAL])

Load MMA fragments from SMEM to registers using hardware access pattern.

Dimensions are derived from the tile layouts: - num_mmas = reg rows, MMA_M = smem rows / num_mmas - lds_frag_width = MMA_M * MMA_K / WARP_SIZE - lds_row_stride: MMA_K (BF16 dense), smem stride (FP8 or strided) - num_iterations = reg flat elements / lds_frag_width

Parameters:

smem_layout (TensorLayout): Inferred layout of the SMEM source tile.
reg_layout (TensorLayout): Inferred layout of the register destination tile.
MMA_K (Int): MMA K dimension (hardware instruction width).
swizzle (Optional[Swizzle]): Optional element-space swizzle.

Args:

smem_tile (TileTensor[smem_layout, address_space=AddressSpace.SHARED]): Source [num_mmas * MMA_M, K] in SHARED.
reg_tile (TileTensor[smem_tile.dtype, reg_layout, address_space=AddressSpace.LOCAL]): Destination [num_mmas, K_frags * frag_width] in LOCAL.