Skip to main content

Python class

GGUFWeights

GGUFWeights

class max.graph.weights.GGUFWeights(source, tensors=None, prefix='', allocated=None)

source

Bases: Weights

Implementation for loading weights from GGUF (GPT-Generated Unified Format) files.

GGUFWeights provides an interface to load model weights from GGUF files, which are optimized for quantized large language models. GGUF is the successor to GGML format and is commonly used in the llama.cpp ecosystem for efficient storage and loading of quantized models.

from pathlib import Path
from max.graph.weights import GGUFWeights
from max.dtype import DType
from max.graph.quantization import QuantizationEncoding

gguf_path = Path("model-q4_k.gguf")
weights = GGUFWeights(gguf_path)

# Check if a weight exists
if weights.model.layers[0].attention.wq.exists():
    # Allocate quantized attention weight
    wq_weight = weights.model.layers[0].attention.wq.allocate(
        dtype=DType.uint8,  # GGUF quantized weights use uint8
        device=DeviceRef.CPU()
    )

# Access weight data with quantization info
weight_data = weights.model.layers[0].attention.wq.data()
print(f"Quantization: {weight_data.quantization_encoding}")
print(f"Shape: {weight_data.shape}")

# Allocate with quantization validation
ffn_weight = weights.model.layers[0].feed_forward.w1.allocate(
    quantization_encoding=QuantizationEncoding.Q4_K,
    device=DeviceRef.GPU(0)
)

# Iterate through all weights in a layer
for name, weight in weights.model.layers[0].items():
    if weight.exists():
        print(f"Found weight: {name}")

Creates a GGUF weights reader.

Parameters:

  • source (PathLike[str] | gguf.GGUFReader) – Path to a GGUF file or a GGUFReader object.
  • tensors (dict[str, gguf.ReaderTensor] | None) – List of tensors in the GGUF checkpoint.
  • prefix (str) – Weight name or prefix.
  • allocated (dict[str, DLPackArray] | None) – Dictionary of allocated values.

allocate()

allocate(dtype=None, shape=None, quantization_encoding=None, device=cpu:0)

source

Creates and optionally validates a new Weight.

Parameters:

Return type:

Weight

allocated_weights

property allocated_weights: dict[str, DLPackArray]

source

Gets the values of all weights that were allocated previously.

data()

data()

source

Loads and returns the weight data for this tensor.

Return type:

WeightData

exists()

exists()

source

Returns True if a tensor exists for the current prefix.

Return type:

bool

items()

items()

source

Iterate through all allocable weights that start with the prefix.

name

property name: str

source

The current weight name or prefix.