For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

PreshuffledBLoader

struct PreshuffledBLoader[N: Int, K_BYTES: Int]

Per-lane B fragment loader from preshuffled GMEM (DRAM -> VGPR direct).

The 5D layout places each lane's 16-byte fragment at a contiguous DRAM offset, so a single buffer_load_dwordx4 per lane delivers the MFMA's B operand with no LDS staging. OOB lanes are clamped to zero by the buffer-resource bounds.

Parameters

N (Int): Per-expert N dimension (rows of the logical [N, K_BYTES] tile).
K_BYTES (Int): Per-expert FP4-packed K dimension (= K // 2).

Fields

bc (AMDBufferResource):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDeletable, Movable, RegisterPassable, TrivialRegisterPassable

Methods

`init`

def __init__(b_gmem_tile: TileTensor[DType.uint8, address_space=b_gmem_tile.address_space, linear_idx_type=b_gmem_tile.linear_idx_type, element_size=b_gmem_tile.element_size]) -> Self

Builds the V# from a preshuffled per-expert B byte buffer.

`load_fragment`

def load_fragment(self, n: Int, k_byte: Int) -> SIMD[DType.uint8, 16]

Loads the 16-byte B fragment at logical (n, k_byte).

For one MFMA dispatch a lane calls this with (n = warp_n_off + n_mma * 16 + lane % 16, k_byte = k_tile * 64 + (lane // 16) * 16).

Returns:

SIMD[DType.uint8, 16]

Parameters​

Fields​

Implemented traits​

Methods​

__init__​

load_fragment​