Python class
MAXModelConfig
MAXModelConfig
class max.pipelines.MAXModelConfig(*, config_file=None, section_name=None, use_subgraphs=True, data_parallel_degree=1, pool_embeddings=True, max_length=None, model_path='', served_model_name=None, weight_path=<factory>, quantization_encoding=None, huggingface_model_revision='main', huggingface_weight_revision='main', trust_remote_code=False, subfolder=None, device_specs=<factory>, force_download=False, vision_config_overrides=<factory>, rope_type=None, enable_echo=False, chat_template=None, kv_cache=<factory>)
Bases: MAXModelConfigBase
Configuration for a pipeline model.
Initialize config, allowing tests/internal callers to seed private attributes.
Pydantic private attributes (PrivateAttr) are not regular model fields,
so they are not accepted as constructor kwargs by default. Some tests (and debugging
utilities) intentionally seed _huggingface_config to avoid network
access and to validate config override plumbing. Hence, we need to
explicitly define this __init__ method to seed the private attributes.
-
Parameters:
-
- config_file (str | None)
- section_name (str | None)
- use_subgraphs (bool)
- data_parallel_degree (int)
- pool_embeddings (bool)
- max_length (int | None)
- model_path (str)
- served_model_name (str | None)
- weight_path (list[Path])
- quantization_encoding (Literal['float32', 'bfloat16', 'q4_k', 'q4_0', 'q6_k', 'float8_e4m3fn', 'float4_e2m1fnx2', 'gptq'] | None)
- huggingface_model_revision (str)
- huggingface_weight_revision (str)
- trust_remote_code (bool)
- subfolder (str | None)
- device_specs (list[DeviceSpec])
- force_download (bool)
- vision_config_overrides (dict[str, Any])
- rope_type (Literal['none', 'normal', 'neox', 'longrope', 'yarn'] | None)
- enable_echo (bool)
- chat_template (Path | None)
- kv_cache (KVCacheConfig)
architecture_name
Returns the architecture class name from the HuggingFace config.
For transformers models, returns architectures[0] from the
HuggingFace config.
chat_template
An optional custom chat template to override the one shipped with the model.
create_kv_cache_config()
create_kv_cache_config(**kv_cache_kwargs)
Creates and sets the KV cache configuration with the given parameters.
Creates a new KVCacheConfig from the provided keyword arguments
and automatically sets the cache_dtype based on the model’s quantization
encoding (or any explicit override in kv_cache_kwargs).
-
Parameters:
-
**kv_cache_kwargs – Keyword arguments to pass to the
KVCacheConfigconstructor. Common options include:- kv_cache_page_size: Number of tokens per page for paged cache
- enable_prefix_caching: Whether to enable prefix caching
- device_memory_utilization: Fraction of device memory to use
- cache_dtype: Override for the cache data type
-
Return type:
-
None
data_parallel_degree
data_parallel_degree: int
The degree of data parallelism for replicating the model.
default_device_spec
property default_device_spec: DeviceSpec
Returns the default device spec for the model.
This is the first device spec in the list, used for device spec checks throughout config validation.
-
Returns:
-
The default device spec for the model.
device_specs
device_specs: list[DeviceSpec]
The devices to run inference on.
enable_echo
enable_echo: bool
Whether the model should be built with echo capabilities.
force_download
force_download: bool
Whether to force download a file even if it’s already in the local cache.
generation_config
property generation_config: GenerationConfig
Retrieves the Hugging Face GenerationConfig for this model.
Lazily loads the GenerationConfig from the model repository
and caches it to avoid repeated remote fetches.
-
Returns:
-
The
GenerationConfigfor the model, containing generation parameters includingmax_length,temperature, andtop_p. If loading fails, returns a defaultGenerationConfig.
graph_quantization_encoding
property graph_quantization_encoding: QuantizationEncoding | None
Converts the CLI encoding to a MAX Graph quantization encoding.
-
Returns:
-
The graph quantization encoding corresponding to the CLI encoding.
-
Raises:
-
ValueError – If no CLI encoding was specified.
huggingface_config
property huggingface_config: PreTrainedConfig
Returns the Hugging Face model config (loaded on first access).
For transformers models, returns the AutoConfig subclass. For
non-transformers models (e.g. diffusers components), falls back to
loading the raw config.json and wrapping it in a
PretrainedConfig.
-
Raises:
-
FileNotFoundError – If no
config.jsoncan be found for the model repo/subfolder.
huggingface_model_repo
property huggingface_model_repo: HuggingFaceRepo
Returns the Hugging Face repo handle for the model.
The result is cached in a PrivateAttr to avoid recreating
HuggingFaceRepo instances on every access. The cache is
invalidated when the underlying config fields change.
huggingface_model_revision
huggingface_model_revision: str
The branch or Git revision of the Hugging Face model repository.
huggingface_weight_repo
property huggingface_weight_repo: HuggingFaceRepo
Returns the Hugging Face repo handle for weight files.
The result is cached in a PrivateAttr to avoid recreating
HuggingFaceRepo instances (and triggering redundant HF API
calls for file listing, encoding detection, etc.) on every
access. The cache is invalidated when the underlying config
fields change (e.g. after model_copy()).
huggingface_weight_repo_id
property huggingface_weight_repo_id: str
Returns the Hugging Face repo ID used for weight files.
huggingface_weight_revision
huggingface_weight_revision: str
The branch or Git revision of the Hugging Face weights repository.
kv_cache
kv_cache: KVCacheConfig
The KV cache configuration.
log_model_info()
log_model_info(role)
Logs model configuration information for this config.
-
Parameters:
-
role (str) – The semantic role of this model (e.g.
"main","draft","vae"). -
Return type:
-
None
max_length
The maximum sequence length the model can process.
model_config
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'strict': False}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_name
property model_name: str
Returns the served model name or model path.
model_path
model_path: str
The repository ID of a Hugging Face model to use.
model_post_init()
model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
-
Parameters:
-
- self (BaseModel) – The BaseModel instance.
- context (Any) – The context.
-
Return type:
-
None
pool_embeddings
pool_embeddings: bool
Whether to pool embedding outputs.
quantization_encoding
quantization_encoding: SupportedEncoding | None
The weight encoding type.
resolve()
resolve()
Validates and resolves the config.
Called after initialization to ensure all fields are in a valid state and to set fields that can’t be determined in the default factory.
Resolves fields in this order:
- Resolves
chat_templateif it’s a path. - Validates that the provided
device_specsare available. - Parses the weight path and initializes
_weights_repo_id.
-
Return type:
-
None
resolved_weight_paths()
resolved_weight_paths()
Resolve weight paths to absolute local paths, downloading if needed.
For online repos, downloads weight files from HuggingFace Hub. For local repos, constructs absolute paths from the repo root.
retrieve_chat_template()
retrieve_chat_template()
Returns the chat template string, or None if not set.
-
Return type:
-
str | None
rope_type
The RoPE type to use, forced regardless of model defaults.
sampling_params_defaults
property sampling_params_defaults: SamplingParamsGenerationConfigDefaults
Returns sampling defaults derived from the generation config.
served_model_name
An optional override for the client-facing model name.
set_cache_dtype_given_quantization_encoding()
set_cache_dtype_given_quantization_encoding()
Determines the KV cache dtype based on quantization encoding configuration.
The dtype is determined in the following priority order:
- Explicit override from
kv_cache.kv_cache_format(if set). - Derived from the model’s
quantization_encoding. - Falls back to
float32if no encoding is specified.
-
Return type:
-
None
subfolder
Subdirectory within the HuggingFace repo to load config and weights from.
trust_remote_code
trust_remote_code: bool
Whether to allow custom modelling files from Hugging Face.
use_subgraphs
use_subgraphs: bool
Whether to use subgraphs for the model.
validate_and_resolve_quantization_encoding_weight_path()
validate_and_resolve_quantization_encoding_weight_path(default_encoding)
Verifies that the quantization encoding and weight path are consistent.
-
Parameters:
-
- weight_path – The path to the weight file.
- default_encoding (max.pipelines.lib.config.SupportedEncoding) – The default encoding to use if no encoding is provided.
-
Return type:
-
None
validate_and_resolve_rope_type()
validate_and_resolve_rope_type(arch_rope_type)
Resolves rope_type from architecture default if not set.
-
Parameters:
-
arch_rope_type (Literal['none', 'normal', 'neox', 'longrope', 'yarn'])
-
Return type:
-
None
validate_and_resolve_with_resolved_quantization_encoding()
validate_and_resolve_with_resolved_quantization_encoding(supported_encodings, default_weights_format)
Validates model path and weight path against resolved quantization encoding.
Also finalizes the encoding config.
-
Parameters:
-
- supported_encodings (set[max.pipelines.lib.config.SupportedEncoding]) – A dictionary of supported encodings and their corresponding KV cache strategies.
- default_weights_format (WeightsFormat) – The default weights format to use if no weights format is provided.
-
Return type:
-
None
validate_lora_compatibility()
validate_lora_compatibility()
Validates that LoRA configuration is compatible with model settings.
-
Raises:
-
ValueError – If LoRA is enabled but incompatible with current model configuration.
-
Return type:
-
None
validate_max_length()
classmethod validate_max_length(v)
Validate that max_length is non-negative if provided.
validate_multi_gpu_supported()
validate_multi_gpu_supported(multi_gpu_supported)
Validates that the model architecture supports multi-GPU inference.
-
Parameters:
-
multi_gpu_supported (bool) – Whether the model architecture supports multi-GPU inference.
-
Return type:
-
None
vision_config_overrides
Model-specific vision configuration overrides.
weight_path
The path or URL of the model weights to use.
weights_size()
weights_size()
Calculates the total size in bytes of all weight files in weight_path.
Attempts to find the weights locally first to avoid network calls, checking in the following order:
- If
repo_typeis"local", it checks if the path inweight_pathexists directly as a local file path. - Otherwise, if
repo_typeis"online", it first checks the local Hugging Face cache usinghuggingface_hub.try_to_load_from_cache(). If not found in the cache, it falls back to querying the Hugging Face Hub API viaHuggingFaceRepo.size_of().
-
Returns:
-
The total size of all weight files in bytes.
-
Raises:
-
- FileNotFoundError – If
repo_typeis"local"and a file specified inweight_pathis not found within the local repo directory. - ValueError – If
HuggingFaceRepo.size_of()fails to retrieve the file size from the Hugging Face Hub API (for example, file metadata not available or API error). - RuntimeError – If the determined
repo_typeis unexpected.
- FileNotFoundError – If
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!