Python class

MAXModelConfig

`MAXModelConfig`

class max.pipelines.MAXModelConfig(*, config_file=None, section_name=None, use_subgraphs=True, data_parallel_degree=1, pool_embeddings=True, max_length=None, model_path='', served_model_name=None, weight_path=<factory>, quantization_encoding=None, huggingface_model_revision='main', huggingface_weight_revision='main', trust_remote_code=False, subfolder=None, device_specs=<factory>, force_download=False, vision_config_overrides=<factory>, rope_type=None, enable_echo=False, chat_template=None, kv_cache=<factory>)

source

Bases: MAXModelConfigBase

Configuration for a pipeline model.

Initialize config, allowing tests/internal callers to seed private attributes.

Pydantic private attributes (PrivateAttr) are not regular model fields, so they are not accepted as constructor kwargs by default. Some tests (and debugging utilities) intentionally seed _huggingface_config to avoid network access and to validate config override plumbing. Hence, we need to explicitly define this __init__ method to seed the private attributes.

Parameters:

config_file (str | None)
section_name (str | None)
use_subgraphs (bool)
data_parallel_degree (int)
pool_embeddings (bool)
max_length (int | None)
model_path (str)
served_model_name (str | None)
weight_path (list[Path])
quantization_encoding (Literal['float32', 'bfloat16', 'q4_k', 'q4_0', 'q6_k', 'float8_e4m3fn', 'float4_e2m1fnx2', 'gptq'] | None)
huggingface_model_revision (str)
huggingface_weight_revision (str)
trust_remote_code (bool)
subfolder (str | None)
device_specs (list[DeviceSpec])
force_download (bool)
vision_config_overrides (dict[str, Any])
rope_type (Literal['none', 'normal', 'neox', 'longrope', 'yarn'] | None)
enable_echo (bool)
chat_template (Path | None)
kv_cache (KVCacheConfig)

`architecture_name`

property architecture_name: str | None

source

Returns the architecture class name from the HuggingFace config.

For transformers models, returns architectures[0] from the HuggingFace config.

`chat_template`

chat_template: Path | None

source

An optional custom chat template to override the one shipped with the model.

`create_kv_cache_config()`

create_kv_cache_config(**kv_cache_kwargs)

source

Creates and sets the KV cache configuration with the given parameters.

Creates a new KVCacheConfig from the provided keyword arguments and automatically sets the cache_dtype based on the model’s quantization encoding (or any explicit override in kv_cache_kwargs).

Parameters:

**kv_cache_kwargs – Keyword arguments to pass to the KVCacheConfig constructor. Common options include:

kv_cache_page_size: Number of tokens per page for paged cache
enable_prefix_caching: Whether to enable prefix caching
device_memory_utilization: Fraction of device memory to use
cache_dtype: Override for the cache data type

Return type:

None

`data_parallel_degree`

data_parallel_degree: int

source

The degree of data parallelism for replicating the model.

`default_device_spec`

property default_device_spec: DeviceSpec

source

Returns the default device spec for the model.

This is the first device spec in the list, used for device spec checks throughout config validation.

Returns:: The default device spec for the model.

`device_specs`

device_specs: list[DeviceSpec]

source

The devices to run inference on.

`enable_echo`

enable_echo: bool

source

Whether the model should be built with echo capabilities.

`force_download`

force_download: bool

source

Whether to force download a file even if it’s already in the local cache.

`generation_config`

property generation_config: GenerationConfig

source

Retrieves the Hugging Face GenerationConfig for this model.

Lazily loads the GenerationConfig from the model repository and caches it to avoid repeated remote fetches.

Returns:: The GenerationConfig for the model, containing generation parameters including max_length, temperature, and top_p. If loading fails, returns a default GenerationConfig.

`graph_quantization_encoding`

property graph_quantization_encoding: QuantizationEncoding | None

source

Converts the CLI encoding to a MAX Graph quantization encoding.

Returns:: The graph quantization encoding corresponding to the CLI encoding.
Raises:: ValueError – If no CLI encoding was specified.

`huggingface_config`

property huggingface_config: PreTrainedConfig

source

Returns the Hugging Face model config (loaded on first access).

For transformers models, returns the AutoConfig subclass. For non-transformers models (e.g. diffusers components), falls back to loading the raw config.json and wrapping it in a PretrainedConfig.

Raises:: FileNotFoundError – If no config.json can be found for the model repo/subfolder.

`huggingface_model_repo`

property huggingface_model_repo: HuggingFaceRepo

source

Returns the Hugging Face repo handle for the model.

The result is cached in a PrivateAttr to avoid recreating HuggingFaceRepo instances on every access. The cache is invalidated when the underlying config fields change.

`huggingface_model_revision`

huggingface_model_revision: str

source

The branch or Git revision of the Hugging Face model repository.

`huggingface_weight_repo`

property huggingface_weight_repo: HuggingFaceRepo

source

Returns the Hugging Face repo handle for weight files.

The result is cached in a PrivateAttr to avoid recreating HuggingFaceRepo instances (and triggering redundant HF API calls for file listing, encoding detection, etc.) on every access. The cache is invalidated when the underlying config fields change (e.g. after model_copy()).

`huggingface_weight_repo_id`

property huggingface_weight_repo_id: str

source

Returns the Hugging Face repo ID used for weight files.

`huggingface_weight_revision`

huggingface_weight_revision: str

source

The branch or Git revision of the Hugging Face weights repository.

`kv_cache`

kv_cache: KVCacheConfig

source

The KV cache configuration.

`log_model_info()`

log_model_info(role)

source

Logs model configuration information for this config.

Parameters:: role (str) – The semantic role of this model (e.g. "main", "draft", "vae").
Return type:: None

`max_length`

max_length: int | None

source

The maximum sequence length the model can process.

`model_config`

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'strict': False}

source

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

`model_name`

property model_name: str

source

Returns the served model name or model path.

`model_path`

model_path: str

source

The repository ID of a Hugging Face model to use.

`model_post_init()`

model_post_init(context, /)

source

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self (BaseModel) – The BaseModel instance.
context (Any) – The context.

Return type:

None

`pool_embeddings`

pool_embeddings: bool

source

Whether to pool embedding outputs.

`quantization_encoding`

quantization_encoding: SupportedEncoding | None

source

The weight encoding type.

`resolve()`

resolve()

source

Validates and resolves the config.

Called after initialization to ensure all fields are in a valid state and to set fields that can’t be determined in the default factory.

Resolves fields in this order:

Resolves chat_template if it’s a path.
Validates that the provided device_specs are available.
Parses the weight path and initializes _weights_repo_id.

Return type:: None

`resolved_weight_paths()`

resolved_weight_paths()

source

Resolve weight paths to absolute local paths, downloading if needed.

For online repos, downloads weight files from HuggingFace Hub. For local repos, constructs absolute paths from the repo root.

Returns:: Absolute paths to weight files on disk.
Return type:: list[Path]

`retrieve_chat_template()`

retrieve_chat_template()

source

Returns the chat template string, or None if not set.

Return type:: str | None

`rope_type`

rope_type: RopeType | None

source

The RoPE type to use, forced regardless of model defaults.

`sampling_params_defaults`

property sampling_params_defaults: SamplingParamsGenerationConfigDefaults

source

Returns sampling defaults derived from the generation config.

`served_model_name`

served_model_name: str | None

source

An optional override for the client-facing model name.

`set_cache_dtype_given_quantization_encoding()`

set_cache_dtype_given_quantization_encoding()

source

Determines the KV cache dtype based on quantization encoding configuration.

The dtype is determined in the following priority order:

Explicit override from kv_cache.kv_cache_format (if set).
Derived from the model’s quantization_encoding.
Falls back to float32 if no encoding is specified.

Return type:: None

`subfolder`

subfolder: str | None

source

Subdirectory within the HuggingFace repo to load config and weights from.

`trust_remote_code`

trust_remote_code: bool

source

Whether to allow custom modelling files from Hugging Face.

`use_subgraphs`

use_subgraphs: bool

source

Whether to use subgraphs for the model.

`validate_and_resolve_quantization_encoding_weight_path()`

validate_and_resolve_quantization_encoding_weight_path(default_encoding)

source

Verifies that the quantization encoding and weight path are consistent.

Parameters:

weight_path – The path to the weight file.
default_encoding (max.pipelines.lib.config.SupportedEncoding) – The default encoding to use if no encoding is provided.

Return type:

None

`validate_and_resolve_rope_type()`

validate_and_resolve_rope_type(arch_rope_type)

source

Resolves rope_type from architecture default if not set.

Parameters:: arch_rope_type (Literal['none', 'normal', 'neox', 'longrope', 'yarn'])
Return type:: None

`validate_and_resolve_with_resolved_quantization_encoding()`

validate_and_resolve_with_resolved_quantization_encoding(supported_encodings, default_weights_format)

source

Validates model path and weight path against resolved quantization encoding.

Also finalizes the encoding config.

Parameters:

supported_encodings (set[max.pipelines.lib.config.SupportedEncoding]) – A dictionary of supported encodings and their corresponding KV cache strategies.
default_weights_format (WeightsFormat) – The default weights format to use if no weights format is provided.

Return type:

None

`validate_lora_compatibility()`

validate_lora_compatibility()

source

Validates that LoRA configuration is compatible with model settings.

Raises:: ValueError – If LoRA is enabled but incompatible with current model configuration.
Return type:: None

`validate_max_length()`

classmethod validate_max_length(v)

source

Validate that max_length is non-negative if provided.

Parameters:: v (int | None)
Return type:: int | None

`validate_multi_gpu_supported()`

validate_multi_gpu_supported(multi_gpu_supported)

source

Validates that the model architecture supports multi-GPU inference.

Parameters:: multi_gpu_supported (bool) – Whether the model architecture supports multi-GPU inference.
Return type:: None

`vision_config_overrides`

vision_config_overrides: dict[str, Any]

source

Model-specific vision configuration overrides.

`weight_path`

weight_path: list[Path]

source

The path or URL of the model weights to use.

`weights_size()`

weights_size()

source

Calculates the total size in bytes of all weight files in weight_path.

Attempts to find the weights locally first to avoid network calls, checking in the following order:

If repo_type is "local", it checks if the path in weight_path exists directly as a local file path.
Otherwise, if repo_type is "online", it first checks the local Hugging Face cache using huggingface_hub.try_to_load_from_cache(). If not found in the cache, it falls back to querying the Hugging Face Hub API via HuggingFaceRepo.size_of().

Returns:

The total size of all weight files in bytes.

Raises:

FileNotFoundError – If repo_type is "local" and a file specified in weight_path is not found within the local repo directory.
ValueError – If HuggingFaceRepo.size_of() fails to retrieve the file size from the Hugging Face Hub API (for example, file metadata not available or API error).
RuntimeError – If the determined repo_type is unexpected.

Return type:

int

MAXModelConfig

MAXModelConfig​

architecture_name​

chat_template​

create_kv_cache_config()​

data_parallel_degree​

default_device_spec​

device_specs​

enable_echo​

force_download​

generation_config​

graph_quantization_encoding​

huggingface_config​

huggingface_model_repo​

huggingface_model_revision​

huggingface_weight_repo​

huggingface_weight_repo_id​

huggingface_weight_revision​

kv_cache​

log_model_info()​

max_length​

model_config​

model_name​

model_path​

model_post_init()​

pool_embeddings​

quantization_encoding​

resolve()​

resolved_weight_paths()​

retrieve_chat_template()​

rope_type​

sampling_params_defaults​

served_model_name​

set_cache_dtype_given_quantization_encoding()​

subfolder​

trust_remote_code​

use_subgraphs​

validate_and_resolve_quantization_encoding_weight_path()​

validate_and_resolve_rope_type()​

validate_and_resolve_with_resolved_quantization_encoding()​

validate_lora_compatibility()​

validate_max_length()​

validate_multi_gpu_supported()​

vision_config_overrides​

weight_path​

weights_size()​

`MAXModelConfig`

`architecture_name`

`chat_template`

`create_kv_cache_config()`

`data_parallel_degree`

`default_device_spec`

`device_specs`

`enable_echo`

`force_download`

`generation_config`

`graph_quantization_encoding`

`huggingface_config`

`huggingface_model_repo`

`huggingface_model_revision`

`huggingface_weight_repo`

`huggingface_weight_repo_id`

`huggingface_weight_revision`

`kv_cache`

`log_model_info()`

`max_length`

`model_config`

`model_name`

`model_path`

`model_post_init()`

`pool_embeddings`

`quantization_encoding`

`resolve()`

`resolved_weight_paths()`

`retrieve_chat_template()`

`rope_type`

`sampling_params_defaults`

`served_model_name`

`set_cache_dtype_given_quantization_encoding()`

`subfolder`

`trust_remote_code`

`use_subgraphs`

`validate_and_resolve_quantization_encoding_weight_path()`

`validate_and_resolve_rope_type()`

`validate_and_resolve_with_resolved_quantization_encoding()`

`validate_lora_compatibility()`

`validate_max_length()`

`validate_multi_gpu_supported()`

`vision_config_overrides`

`weight_path`

`weights_size()`