Python module

model_config

MAX model config classes.

`MAXModelConfig`

class max.pipelines.lib.model_config.MAXModelConfig(*, config_file=None, section_name=None, use_subgraphs=True, data_parallel_degree=1, model_path='', served_model_name=None, weight_path=<factory>, quantization_encoding=None, allow_safetensors_weights_fp32_bf6_bidirectional_cast=False, huggingface_model_revision='main', huggingface_weight_revision='main', trust_remote_code=False, device_specs=<factory>, force_download=False, vision_config_overrides=<factory>, rope_type=None, kv_cache=<factory>)

Bases: MAXModelConfigBase

Initialize config, allowing tests/internal callers to seed PrivateAttrs.

Pydantic PrivateAttrs are not regular model fields, so they are not accepted as constructor kwargs by default. Some tests (and debugging utilities) intentionally seed _huggingface_config to avoid network access and to validate config override plumbing. Hence, we need to explicitly define this __init__ method to seed the PrivateAttr(s).

Parameters:

config_file (str | None)
section_name (str | None)
use_subgraphs (bool)
data_parallel_degree (int)
model_path (str)
served_model_name (str | None)
weight_path (list[Path])
quantization_encoding (SupportedEncoding | None)
allow_safetensors_weights_fp32_bf6_bidirectional_cast (bool)
huggingface_model_revision (str)
huggingface_weight_revision (str)
trust_remote_code (bool)
device_specs (list[DeviceSpec])
force_download (bool)
vision_config_overrides (dict[str, Any])
rope_type (RopeType | None)
kv_cache (KVCacheConfig)

`allow_safetensors_weights_fp32_bf6_bidirectional_cast`

allow_safetensors_weights_fp32_bf6_bidirectional_cast: bool

`data_parallel_degree`

data_parallel_degree: int

`default_device_spec`

property default_device_spec: DeviceSpec

Returns the default device spec for the model. This is the first device spec in the list and is mostly used for device spec checks throughout config validation.

Returns:: The default device spec for the model.

`device_specs`

device_specs: list[DeviceSpec]

`force_download`

force_download: bool

`generation_config`

property generation_config: GenerationConfig

Retrieve the HuggingFace GenerationConfig for this model.

This property lazily loads the GenerationConfig from the model repository and caches it to avoid repeated remote fetches.

Returns:: The GenerationConfig for the model, containing generation parameters like max_length, temperature, top_p, etc. If loading fails, returns a default GenerationConfig.

`graph_quantization_encoding`

property graph_quantization_encoding: QuantizationEncoding | None

Converts the CLI encoding to a MAX Graph quantization encoding.

Returns:: The graph quantization encoding corresponding to the CLI encoding.
Raises:: ValueError – If no CLI encoding was specified.

`huggingface_config`

property huggingface_config: AutoConfig

`huggingface_model_repo`

property huggingface_model_repo: HuggingFaceRepo

`huggingface_model_revision`

huggingface_model_revision: str

`huggingface_weight_repo`

property huggingface_weight_repo: HuggingFaceRepo

`huggingface_weight_repo_id`

property huggingface_weight_repo_id: str

`huggingface_weight_revision`

huggingface_weight_revision: str

`kv_cache`

kv_cache: KVCacheConfig

`model_config`

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

`model_name`

property model_name: str

`model_path`

model_path: str

`model_post_init()`

model_post_init(context, /)

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self (BaseModel) – The BaseModel instance.
context (Any) – The context.

Return type:

None

`quantization_encoding`

quantization_encoding: SupportedEncoding | None

`resolve()`

resolve()

Validates and resolves the config.

This method is called after the model config is initialized, to ensure that all config fields have been initialized to a valid state. It will also set and update other fields which may not be determined / initialized in the default factory.

In order:

Validate that the device_specs provided are available
Parse the weight path(s) and initialize the _weights_repo_id

Return type:: None

`rope_type`

rope_type: RopeType | None

`sampling_params_defaults`

property sampling_params_defaults: SamplingParamsGenerationConfigDefaults

`served_model_name`

served_model_name: str | None

`trust_remote_code`

trust_remote_code: bool

`use_subgraphs`

use_subgraphs: bool

`validate_and_resolve_quantization_encoding_weight_path()`

validate_and_resolve_quantization_encoding_weight_path(default_encoding)

Verifies that the quantization encoding and weight path provided are consistent.

Parameters:

weight_path – The path to the weight file.
default_encoding (SupportedEncoding) – The default encoding to use if no encoding is provided.

Return type:

None

`validate_and_resolve_rope_type()`

validate_and_resolve_rope_type(arch_rope_type)

Parameters:: arch_rope_type (RopeType)
Return type:: None

`validate_and_resolve_with_resolved_quantization_encoding()`

validate_and_resolve_with_resolved_quantization_encoding(supported_encodings, default_weights_format)

Validates that the model path, and weight path provided are consistent with a resolved quantization encoding. Also resolves the KV cache strategy and finalizes the encoding config.

Parameters:

supported_encodings (dict[SupportedEncoding, list[KVCacheStrategy]]) – A dictionary of supported encodings and their corresponding KV cache strategies.
default_weights_format (WeightsFormat) – The default weights format to use if no weights format is provided.

Return type:

None

`validate_lora_compatibility()`

validate_lora_compatibility()

Validates that LoRA configuration is compatible with model settings.

Raises:: ValueError – If LoRA is enabled but incompatible with current model configuration.
Return type:: None

`validate_multi_gpu_supported()`

validate_multi_gpu_supported(multi_gpu_supported)

Validates that the model architecture supports multi-GPU inference.

Parameters:: multi_gpu_supported (bool) – Whether the model architecture supports multi-GPU inference.
Return type:: None

`vision_config_overrides`

vision_config_overrides: dict[str, Any]

`weight_path`

weight_path: list[Path]

`weights_size()`

weights_size()

Calculates the total size in bytes of all weight files specified in weight_path.

This method attempts to find the weights locally first to avoid network calls, checking in the following order:

If repo_type is RepoType.local, it checks if the path in weight_path exists directly as a local file path.
Otherwise, if repo_type is RepoType.online, it first checks the local Hugging Face cache using huggingface_hub.try_to_load_from_cache(). If not found in the cache, it falls back to querying the Hugging Face Hub API via HuggingFaceRepo.size_of().

Returns:

The total size of all weight files in bytes.

Raises:

FileNotFoundError – If repo_type is RepoType.local and a file specified in weight_path is not found within the local repo directory.
ValueError – If HuggingFaceRepo.size_of() fails to retrieve the file size from the Hugging Face Hub API (e.g., file metadata not available or API error).
RuntimeError – If the determined repo_type is unexpected.

Return type:

int

`MAXModelConfigBase`

class max.pipelines.lib.model_config.MAXModelConfigBase(*, config_file=None, section_name=None)

Bases: ConfigFileModel

Abstract base class for all (required) MAX model configs.

This base class is used to configure a model to use for a pipeline, but also handy to sidestep the need to pass in optional fields when subclassing MAXModelConfig.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

config_file (str | None)
section_name (str | None)

`model_config`

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

MAXModelConfig​

allow_safetensors_weights_fp32_bf6_bidirectional_cast​

data_parallel_degree​

default_device_spec​

device_specs​

force_download​

generation_config​

graph_quantization_encoding​

huggingface_config​

huggingface_model_repo​

huggingface_model_revision​

huggingface_weight_repo​

huggingface_weight_repo_id​

huggingface_weight_revision​

kv_cache​

model_config​

model_name​

model_path​

model_post_init()​

quantization_encoding​

resolve()​

rope_type​

sampling_params_defaults​

served_model_name​

trust_remote_code​

use_subgraphs​

validate_and_resolve_quantization_encoding_weight_path()​

validate_and_resolve_rope_type()​

validate_and_resolve_with_resolved_quantization_encoding()​

validate_lora_compatibility()​

validate_multi_gpu_supported()​

vision_config_overrides​

weight_path​

weights_size()​

MAXModelConfigBase​

model_config​

`MAXModelConfig`

`allow_safetensors_weights_fp32_bf6_bidirectional_cast`

`data_parallel_degree`

`default_device_spec`

`device_specs`

`force_download`

`generation_config`

`graph_quantization_encoding`

`huggingface_config`

`huggingface_model_repo`

`huggingface_model_revision`

`huggingface_weight_repo`

`huggingface_weight_repo_id`

`huggingface_weight_revision`

`kv_cache`

`model_config`

`model_name`

`model_path`

`model_post_init()`

`quantization_encoding`

`resolve()`

`rope_type`

`sampling_params_defaults`

`served_model_name`

`trust_remote_code`

`use_subgraphs`

`validate_and_resolve_quantization_encoding_weight_path()`

`validate_and_resolve_rope_type()`

`validate_and_resolve_with_resolved_quantization_encoding()`

`validate_lora_compatibility()`

`validate_multi_gpu_supported()`

`vision_config_overrides`

`weight_path`

`weights_size()`

`MAXModelConfigBase`

`model_config`