Skip to main content

Python class

MAXModelConfig

MAXModelConfig

class max.pipelines.MAXModelConfig(*, config_file=None, section_name=None, use_subgraphs=True, data_parallel_degree=1, pool_embeddings=True, max_length=None, model_path='', served_model_name=None, weight_path=<factory>, quantization_encoding=None, huggingface_model_revision='main', huggingface_weight_revision='main', trust_remote_code=False, subfolder=None, device_specs=<factory>, force_download=False, vision_config_overrides=<factory>, rope_type=None, enable_echo=False, chat_template=None, kv_cache=<factory>)

source

Bases: MAXModelConfigBase

Configuration for a pipeline model.

Initialize config, allowing tests/internal callers to seed private attributes.

Pydantic private attributes (PrivateAttr) are not regular model fields, so they are not accepted as constructor kwargs by default. Some tests (and debugging utilities) intentionally seed _huggingface_config to avoid network access and to validate config override plumbing. Hence, we need to explicitly define this __init__ method to seed the private attributes.

Parameters:

  • config_file (str | None)
  • section_name (str | None)
  • use_subgraphs (bool)
  • data_parallel_degree (int)
  • pool_embeddings (bool)
  • max_length (int | None)
  • model_path (str)
  • served_model_name (str | None)
  • weight_path (list[Path])
  • quantization_encoding (Literal['float32', 'bfloat16', 'q4_k', 'q4_0', 'q6_k', 'float8_e4m3fn', 'float4_e2m1fnx2', 'gptq'] | None)
  • huggingface_model_revision (str)
  • huggingface_weight_revision (str)
  • trust_remote_code (bool)
  • subfolder (str | None)
  • device_specs (list[DeviceSpec])
  • force_download (bool)
  • vision_config_overrides (dict[str, Any])
  • rope_type (Literal['none', 'normal', 'neox', 'longrope', 'yarn'] | None)
  • enable_echo (bool)
  • chat_template (Path | None)
  • kv_cache (KVCacheConfig)

architecture_name

property architecture_name: str | None

source

Returns the architecture class name from the HuggingFace config.

For transformers models, returns architectures[0] from the HuggingFace config.

chat_template

chat_template: Path | None

source

An optional custom chat template to override the one shipped with the model.

create_kv_cache_config()

create_kv_cache_config(**kv_cache_kwargs)

source

Creates and sets the KV cache configuration with the given parameters.

Creates a new KVCacheConfig from the provided keyword arguments and automatically sets the cache_dtype based on the model’s quantization encoding (or any explicit override in kv_cache_kwargs).

Parameters:

**kv_cache_kwargs – Keyword arguments to pass to the KVCacheConfig constructor. Common options include:

  • kv_cache_page_size: Number of tokens per page for paged cache
  • enable_prefix_caching: Whether to enable prefix caching
  • device_memory_utilization: Fraction of device memory to use
  • cache_dtype: Override for the cache data type

Return type:

None

data_parallel_degree

data_parallel_degree: int

source

The degree of data parallelism for replicating the model.

default_device_spec

property default_device_spec: DeviceSpec

source

Returns the default device spec for the model.

This is the first device spec in the list, used for device spec checks throughout config validation.

Returns:

The default device spec for the model.

device_specs

device_specs: list[DeviceSpec]

source

The devices to run inference on.

enable_echo

enable_echo: bool

source

Whether the model should be built with echo capabilities.

force_download

force_download: bool

source

Whether to force download a file even if it’s already in the local cache.

generation_config

property generation_config: GenerationConfig

source

Retrieves the Hugging Face GenerationConfig for this model.

Lazily loads the GenerationConfig from the model repository and caches it to avoid repeated remote fetches.

Returns:

The GenerationConfig for the model, containing generation parameters including max_length, temperature, and top_p. If loading fails, returns a default GenerationConfig.

graph_quantization_encoding

property graph_quantization_encoding: QuantizationEncoding | None

source

Converts the CLI encoding to a MAX Graph quantization encoding.

Returns:

The graph quantization encoding corresponding to the CLI encoding.

Raises:

ValueError – If no CLI encoding was specified.

huggingface_config

property huggingface_config: PreTrainedConfig

source

Returns the Hugging Face model config (loaded on first access).

For transformers models, returns the AutoConfig subclass. For non-transformers models (e.g. diffusers components), falls back to loading the raw config.json and wrapping it in a PretrainedConfig.

Raises:

FileNotFoundError – If no config.json can be found for the model repo/subfolder.

huggingface_model_repo

property huggingface_model_repo: HuggingFaceRepo

source

Returns the Hugging Face repo handle for the model.

The result is cached in a PrivateAttr to avoid recreating HuggingFaceRepo instances on every access. The cache is invalidated when the underlying config fields change.

huggingface_model_revision

huggingface_model_revision: str

source

The branch or Git revision of the Hugging Face model repository.

huggingface_weight_repo

property huggingface_weight_repo: HuggingFaceRepo

source

Returns the Hugging Face repo handle for weight files.

The result is cached in a PrivateAttr to avoid recreating HuggingFaceRepo instances (and triggering redundant HF API calls for file listing, encoding detection, etc.) on every access. The cache is invalidated when the underlying config fields change (e.g. after model_copy()).

huggingface_weight_repo_id

property huggingface_weight_repo_id: str

source

Returns the Hugging Face repo ID used for weight files.

huggingface_weight_revision

huggingface_weight_revision: str

source

The branch or Git revision of the Hugging Face weights repository.

kv_cache

kv_cache: KVCacheConfig

source

The KV cache configuration.

log_model_info()

log_model_info(role)

source

Logs model configuration information for this config.

Parameters:

role (str) – The semantic role of this model (e.g. "main", "draft", "vae").

Return type:

None

max_length

max_length: int | None

source

The maximum sequence length the model can process.

model_config

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'strict': False}

source

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_name

property model_name: str

source

Returns the served model name or model path.

model_path

model_path: str

source

The repository ID of a Hugging Face model to use.

model_post_init()

model_post_init(context, /)

source

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

  • self (BaseModel) – The BaseModel instance.
  • context (Any) – The context.

Return type:

None

pool_embeddings

pool_embeddings: bool

source

Whether to pool embedding outputs.

quantization_encoding

quantization_encoding: SupportedEncoding | None

source

The weight encoding type.

resolve()

resolve()

source

Validates and resolves the config.

Called after initialization to ensure all fields are in a valid state and to set fields that can’t be determined in the default factory.

Resolves fields in this order:

  1. Resolves chat_template if it’s a path.
  2. Validates that the provided device_specs are available.
  3. Parses the weight path and initializes _weights_repo_id.

Return type:

None

resolved_weight_paths()

resolved_weight_paths()

source

Resolve weight paths to absolute local paths, downloading if needed.

For online repos, downloads weight files from HuggingFace Hub. For local repos, constructs absolute paths from the repo root.

Returns:

Absolute paths to weight files on disk.

Return type:

list[Path]

retrieve_chat_template()

retrieve_chat_template()

source

Returns the chat template string, or None if not set.

Return type:

str | None

rope_type

rope_type: RopeType | None

source

The RoPE type to use, forced regardless of model defaults.

sampling_params_defaults

property sampling_params_defaults: SamplingParamsGenerationConfigDefaults

source

Returns sampling defaults derived from the generation config.

served_model_name

served_model_name: str | None

source

An optional override for the client-facing model name.

set_cache_dtype_given_quantization_encoding()

set_cache_dtype_given_quantization_encoding()

source

Determines the KV cache dtype based on quantization encoding configuration.

The dtype is determined in the following priority order:

  1. Explicit override from kv_cache.kv_cache_format (if set).
  2. Derived from the model’s quantization_encoding.
  3. Falls back to float32 if no encoding is specified.

Return type:

None

subfolder

subfolder: str | None

source

Subdirectory within the HuggingFace repo to load config and weights from.

trust_remote_code

trust_remote_code: bool

source

Whether to allow custom modelling files from Hugging Face.

use_subgraphs

use_subgraphs: bool

source

Whether to use subgraphs for the model.

validate_and_resolve_quantization_encoding_weight_path()

validate_and_resolve_quantization_encoding_weight_path(default_encoding)

source

Verifies that the quantization encoding and weight path are consistent.

Parameters:

  • weight_path – The path to the weight file.
  • default_encoding (max.pipelines.lib.config.SupportedEncoding) – The default encoding to use if no encoding is provided.

Return type:

None

validate_and_resolve_rope_type()

validate_and_resolve_rope_type(arch_rope_type)

source

Resolves rope_type from architecture default if not set.

Parameters:

arch_rope_type (Literal['none', 'normal', 'neox', 'longrope', 'yarn'])

Return type:

None

validate_and_resolve_with_resolved_quantization_encoding()

validate_and_resolve_with_resolved_quantization_encoding(supported_encodings, default_weights_format)

source

Validates model path and weight path against resolved quantization encoding.

Also finalizes the encoding config.

Parameters:

  • supported_encodings (set[max.pipelines.lib.config.SupportedEncoding]) – A dictionary of supported encodings and their corresponding KV cache strategies.
  • default_weights_format (WeightsFormat) – The default weights format to use if no weights format is provided.

Return type:

None

validate_lora_compatibility()

validate_lora_compatibility()

source

Validates that LoRA configuration is compatible with model settings.

Raises:

ValueError – If LoRA is enabled but incompatible with current model configuration.

Return type:

None

validate_max_length()

classmethod validate_max_length(v)

source

Validate that max_length is non-negative if provided.

Parameters:

v (int | None)

Return type:

int | None

validate_multi_gpu_supported()

validate_multi_gpu_supported(multi_gpu_supported)

source

Validates that the model architecture supports multi-GPU inference.

Parameters:

multi_gpu_supported (bool) – Whether the model architecture supports multi-GPU inference.

Return type:

None

vision_config_overrides

vision_config_overrides: dict[str, Any]

source

Model-specific vision configuration overrides.

weight_path

weight_path: list[Path]

source

The path or URL of the model weights to use.

weights_size()

weights_size()

source

Calculates the total size in bytes of all weight files in weight_path.

Attempts to find the weights locally first to avoid network calls, checking in the following order:

  1. If repo_type is "local", it checks if the path in weight_path exists directly as a local file path.
  2. Otherwise, if repo_type is "online", it first checks the local Hugging Face cache using huggingface_hub.try_to_load_from_cache(). If not found in the cache, it falls back to querying the Hugging Face Hub API via HuggingFaceRepo.size_of().

Returns:

The total size of all weight files in bytes.

Raises:

  • FileNotFoundError – If repo_type is "local" and a file specified in weight_path is not found within the local repo directory.
  • ValueError – If HuggingFaceRepo.size_of() fails to retrieve the file size from the Hugging Face Hub API (for example, file metadata not available or API error).
  • RuntimeError – If the determined repo_type is unexpected.

Return type:

int