IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

GGUFWeights

GGUFWeights​

class max.graph.weights.GGUFWeights(source, tensors=None, prefix='', allocated=None)

source

Bases: Weights

Implementation for loading weights from GGUF (GPT-Generated Unified Format) files.

GGUFWeights provides an interface to load model weights from GGUF files, which are optimized for quantized large language models. GGUF is the successor to GGML format and is commonly used in the llama.cpp ecosystem for efficient storage and loading of quantized models.

from pathlib import Path
from max.graph.weights import GGUFWeights
from max.dtype import DType
from max.graph.quantization import QuantizationEncoding

gguf_path = Path("model-q4_k.gguf")
weights = GGUFWeights(gguf_path)

# Check if a weight exists
if weights.model.layers[0].attention.wq.exists():
    # Allocate quantized attention weight
    wq_weight = weights.model.layers[0].attention.wq.allocate(
        dtype=DType.uint8,  # GGUF quantized weights use uint8
        device=DeviceRef.CPU()
    )

# Access weight data with quantization info
weight_data = weights.model.layers[0].attention.wq.data()
print(f"Quantization: {weight_data.quantization_encoding}")
print(f"Shape: {weight_data.shape}")

# Allocate with quantization validation
ffn_weight = weights.model.layers[0].feed_forward.w1.allocate(
    quantization_encoding=QuantizationEncoding.Q4_K,
    device=DeviceRef.GPU(0)
)

# Iterate through all weights in a layer
for name, weight in weights.model.layers[0].items():
    if weight.exists():
        print(f"Found weight: {name}")

Creates a GGUF weights reader.

Parameters:

  • source (PathLike[str] | gguf.GGUFReader) – Path to a GGUF file or a GGUFReader object.
  • tensors (dict[str, gguf.ReaderTensor] | None) – List of tensors in the GGUF checkpoint.
  • prefix (str) – Weight name or prefix.
  • allocated (dict[str, DLPackArray] | None) – Dictionary of allocated values.

allocate()​

allocate(dtype=None, shape=None, quantization_encoding=None, device=cpu:0)

source

Creates and optionally validates a new Weight.

Parameters:

Return type:

Weight

allocated_weights​

property allocated_weights: dict[str, DLPackArray]

source

Gets the values of all weights that were allocated previously.

data()​

data()

source

Loads and returns the weight data for this tensor.

Return type:

WeightData

exists()​

exists()

source

Returns True if a tensor exists for the current prefix.

Return type:

bool

items()​

items()

source

Iterate through all allocable weights that start with the prefix.

name​

property name: str

source

The current weight name or prefix.