IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

load_AB_partial

def load_AB_partial[a_type: DType, b_type: DType, a_scales_type: DType, b_tile_rank: Int, b_tile_shape: IndexList[b_tile_rank], b_desc_shape: IndexList[b_tile_rank], num_pipeline_stages: Int, expert_ids_layout: Layout, a_gmem_layout: Layout, a_scales_gmem_layout: Layout, /, *, a_smem_layout: Layout, b_smem_layout: Layout, a_scales_smem_layout: Layout, block_tile_shape: IndexList[3], cta_group: Int = 1, a_swizzle: TensorMapSwizzle = TensorMapSwizzle.SWIZZLE_NONE](a_gmem: LayoutTensor[a_type, a_gmem_layout, ImmutAnyOrigin], a_scales_gmem: LayoutTensor[a_scales_type, a_scales_gmem_layout, ImmutAnyOrigin], b_tma_op: TMATensorTile[b_type, b_tile_rank, b_tile_shape, b_desc_shape], a_smem_base: UnsafePointer[Scalar[a_type], address_space=AddressSpace.SHARED], b_smem_base: UnsafePointer[Scalar[b_type], address_space=AddressSpace.SHARED], a_scales_smem_base: UnsafePointer[Scalar[a_scales_type], address_space=AddressSpace.SHARED], load_mma_pipeline: ProducerConsumerPipeline[num_pipeline_stages], peer_cta_coord: Tuple[Int, Int, Int], work_tile_coord: Tuple[Int, Int], b_multicast_mask: UInt16, iter_idx: Int, elect_one_cta: Bool, scheduler: TileScheduler[static_MN=scheduler.static_MN, tile_shape=scheduler.tile_shape, cluster=scheduler.cluster, cta_group=scheduler.cta_group, swizzle=scheduler.swizzle, swapAB=scheduler.swapAB], expert_ids: LayoutTensor[DType.int32, expert_ids_layout, ImmutAnyOrigin], expert_end_row: Int, m_tile_global_start: Int)

Sibling to load_AB for tiles the full-TMA path can't handle: fills A and a_scales SMEM via a cooperative warp copy from gmem and issues TMA only for B.