For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

load_AB_partial

def load_AB_partial[a_type: DType, b_type: DType, a_scales_type: DType, b_tile_rank: Int, b_tile_shape: IndexList[b_tile_rank], b_desc_shape: IndexList[b_tile_rank], num_pipeline_stages: Int, expert_ids_layout: Layout, a_gmem_layout: Layout, a_scales_gmem_layout: Layout, /, *, a_smem_layout: Layout, b_smem_layout: Layout, a_scales_smem_layout: Layout, block_tile_shape: IndexList[3], cta_group: Int = 1, a_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_NONE](a_gmem: LayoutTensor[a_type, a_gmem_layout, ImmutAnyOrigin], a_scales_gmem: LayoutTensor[a_scales_type, a_scales_gmem_layout, ImmutAnyOrigin], b_tma_op: TMATensorTile[b_type, b_tile_rank, b_tile_shape, b_desc_shape], a_smem_base: UnsafePointer[Scalar[a_type], address_space=AddressSpace.SHARED], b_smem_base: UnsafePointer[Scalar[b_type], address_space=AddressSpace.SHARED], a_scales_smem_base: UnsafePointer[Scalar[a_scales_type], address_space=AddressSpace.SHARED], load_mma_pipeline: ProducerConsumerPipeline[num_pipeline_stages], peer_cta_coord: Tuple[Int, Int, Int], work_tile_coord: Tuple[Int, Int], b_multicast_mask: UInt16, iter_idx: Int, elect_one_cta: Bool, scheduler: TileScheduler[static_MN=scheduler.static_MN, tile_shape=scheduler.tile_shape, cluster=scheduler.cluster, cta_group=scheduler.cta_group, swizzle=scheduler.swizzle, swapAB=scheduler.swapAB], expert_ids: LayoutTensor[DType.int32, expert_ids_layout, ImmutAnyOrigin], expert_end_row: Int, m_tile_global_start: Int)

Sibling to load_AB for tiles the full-TMA path can't handle: fills A and a_scales SMEM via a cooperative warp copy from gmem and issues TMA only for B.