For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo function
load_lds_fragment
load_lds_fragment[smem_layout: TensorLayout, reg_layout: TensorLayout, //, MMA_K: Int, swizzle: Optional[Swizzle] = Optional()](smem_tile: TileTensor[smem_layout, address_space=AddressSpace.SHARED], reg_tile: TileTensor[smem_tile.dtype, reg_layout, address_space=AddressSpace.LOCAL])
Load MMA fragments from SMEM to registers using hardware access pattern.
Dimensions are derived from the tile layouts: - num_mmas = reg rows, MMA_M = smem rows / num_mmas - lds_frag_width = MMA_M * MMA_K / WARP_SIZE - lds_row_stride: MMA_K (BF16 dense), smem stride (FP8 or strided) - num_iterations = reg flat elements / lds_frag_width
Parameters:
- βsmem_layout (
TensorLayout): Inferred layout of the SMEM source tile. - βreg_layout (
TensorLayout): Inferred layout of the register destination tile. - βMMA_K (
Int): MMA K dimension (hardware instruction width). - βswizzle (
Optional[Swizzle]): Optional element-space swizzle.
Args:
- βsmem_tile (
TileTensor[smem_layout, address_space=AddressSpace.SHARED]): Source [num_mmas * MMA_M, K] in SHARED. - βreg_tile (
TileTensor[smem_tile.dtype, reg_layout, address_space=AddressSpace.LOCAL]): Destination [num_mmas, K_frags * frag_width] in LOCAL.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!