Python class
MAXModelConfig
MAXModelConfigβ
class max.pipelines.MAXModelConfig(*, config_file=None, section_name=None, use_subgraphs=True, data_parallel_degree=1, pool_embeddings=True, max_length=None, model_path='', served_model_name=None, weight_path=<factory>, quantization_encoding=None, huggingface_model_revision='main', huggingface_weight_revision='main', trust_remote_code=False, subfolder=None, device_specs=<factory>, force_download=False, vision_config_overrides=<factory>, rope_type=None, enable_echo=False, chat_template=None, kv_cache=<factory>)
Bases: MAXModelConfigBase
Configuration for a pipeline model.
Initialize config, allowing tests/internal callers to seed private attributes.
Pydantic private attributes (PrivateAttr) are not regular model fields,
so they are not accepted as constructor kwargs by default. Some tests (and debugging
utilities) intentionally seed _huggingface_config to avoid network
access and to validate config override plumbing. Hence, we need to
explicitly define this __init__ method to seed the private attributes.
-
Parameters:
-
- config_file (str | None)
- section_name (str | None)
- use_subgraphs (bool)
- data_parallel_degree (int)
- pool_embeddings (bool)
- max_length (int | None)
- model_path (str)
- served_model_name (str | None)
- weight_path (list[Path])
- quantization_encoding (Literal['float32', 'bfloat16', 'q4_k', 'q4_0', 'q6_k', 'float8_e4m3fn', 'float4_e2m1fnx2', 'gptq'] | None)
- huggingface_model_revision (str)
- huggingface_weight_revision (str)
- trust_remote_code (bool)
- subfolder (str | None)
- device_specs (list[DeviceSpec])
- force_download (bool)
- vision_config_overrides (dict[str, Any])
- rope_type (Literal['none', 'normal', 'neox', 'longrope', 'yarn'] | None)
- enable_echo (bool)
- chat_template (Path | None)
- kv_cache (KVCacheConfig)
architecture_nameβ
Returns the architecture class name from the HuggingFace config.
For transformers models, returns architectures[0] from the
HuggingFace config.
chat_templateβ
An optional custom chat template to override the one shipped with the model.
create_kv_cache_config()β
create_kv_cache_config(**kv_cache_kwargs)
Creates and sets the KV cache configuration with the given parameters.
Creates a new KVCacheConfig from the provided keyword arguments
and automatically sets the cache_dtype based on the modelβs quantization
encoding (or any explicit override in kv_cache_kwargs).
-
Parameters:
-
**kv_cache_kwargs β Keyword arguments to pass to the
KVCacheConfigconstructor. Common options include:- kv_cache_page_size: Number of tokens per page for paged cache
- enable_prefix_caching: Whether to enable prefix caching
- device_memory_utilization: Fraction of device memory to use
- cache_dtype: Override for the cache data type
-
Return type:
-
None
data_parallel_degreeβ
data_parallel_degree: int
The degree of data parallelism for replicating the model.
default_device_specβ
property default_device_spec: DeviceSpec
Returns the default device spec for the model.
This is the first device spec in the list, used for device spec checks throughout config validation.
-
Returns:
-
The default device spec for the model.
device_specsβ
device_specs: list[DeviceSpec]
The devices to run inference on.
enable_echoβ
enable_echo: bool
Whether the model should be built with echo capabilities.
force_downloadβ
force_download: bool
Whether to force download a file even if itβs already in the local cache.
generation_configβ
property generation_config: GenerationConfig
Retrieves the Hugging Face GenerationConfig for this model.
Lazily loads the GenerationConfig from the model repository
and caches it to avoid repeated remote fetches.
-
Returns:
-
The
GenerationConfigfor the model, containing generation parameters includingmax_length,temperature, andtop_p. If loading fails, returns a defaultGenerationConfig.
graph_quantization_encodingβ
property graph_quantization_encoding: QuantizationEncoding | None
Converts the CLI encoding to a MAX Graph quantization encoding.
-
Returns:
-
The graph quantization encoding corresponding to the CLI encoding.
-
Raises:
-
ValueError β If no CLI encoding was specified.
huggingface_configβ
property huggingface_config: PreTrainedConfig
Returns the Hugging Face model config (loaded on first access).
For transformers models, returns the AutoConfig subclass. For
non-transformers models (e.g. diffusers components), falls back to
loading the raw config.json and wrapping it in a
PretrainedConfig.
-
Raises:
-
FileNotFoundError β If no
config.jsoncan be found for the model repo/subfolder.
huggingface_model_repoβ
property huggingface_model_repo: HuggingFaceRepo
Returns the Hugging Face repo handle for the model.
The result is cached in a PrivateAttr to avoid recreating
HuggingFaceRepo instances on every access. The cache is
invalidated when the underlying config fields change.
huggingface_model_revisionβ
huggingface_model_revision: str
The branch or Git revision of the Hugging Face model repository.
huggingface_weight_repoβ
property huggingface_weight_repo: HuggingFaceRepo
Returns the Hugging Face repo handle for weight files.
The result is cached in a PrivateAttr to avoid recreating
HuggingFaceRepo instances (and triggering redundant HF API
calls for file listing, encoding detection, etc.) on every
access. The cache is invalidated when the underlying config
fields change (e.g. after model_copy()).
huggingface_weight_repo_idβ
property huggingface_weight_repo_id: str
Returns the Hugging Face repo ID used for weight files.
huggingface_weight_revisionβ
huggingface_weight_revision: str
The branch or Git revision of the Hugging Face weights repository.
kv_cacheβ
kv_cache: KVCacheConfig
The KV cache configuration.
log_model_info()β
log_model_info(role)
Logs model configuration information for this config.
-
Parameters:
-
role (str) β The semantic role of this model (e.g.
"main","draft","vae"). -
Return type:
-
None
max_lengthβ
The maximum sequence length the model can process.
model_configβ
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'strict': False}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_nameβ
property model_name: str
Returns the served model name or model path.
model_pathβ
model_path: str
The repository ID of a Hugging Face model to use.
model_post_init()β
model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since thatβs what pydantic-core passes when calling it.
-
Parameters:
-
- self (BaseModel) β The BaseModel instance.
- context (Any) β The context.
-
Return type:
-
None
pool_embeddingsβ
pool_embeddings: bool
Whether to pool embedding outputs.
quantization_encodingβ
quantization_encoding: SupportedEncoding | None
The weight encoding type.
resolve()β
resolve()
Validates and resolves the config.
Called after initialization to ensure all fields are in a valid state and to set fields that canβt be determined in the default factory.
Resolves fields in this order:
- Resolves
chat_templateif itβs a path. - Validates that the provided
device_specsare available. - Parses the weight path and initializes
_weights_repo_id.
-
Return type:
-
None
resolved_weight_paths()β
resolved_weight_paths()
Resolve weight paths to absolute local paths, downloading if needed.
For online repos, downloads weight files from HuggingFace Hub. For local repos, constructs absolute paths from the repo root.
retrieve_chat_template()β
retrieve_chat_template()
Returns the chat template string, or None if not set.
-
Return type:
-
str | None
rope_typeβ
The RoPE type to use, forced regardless of model defaults.
sampling_params_defaultsβ
property sampling_params_defaults: SamplingParamsGenerationConfigDefaults
Returns sampling defaults derived from the generation config.
served_model_nameβ
An optional override for the client-facing model name.
set_cache_dtype_given_quantization_encoding()β
set_cache_dtype_given_quantization_encoding()
Determines the KV cache dtype based on quantization encoding configuration.
The dtype is determined in the following priority order:
- Explicit override from
kv_cache.kv_cache_format(if set). - Derived from the modelβs
quantization_encoding. - Falls back to
float32if no encoding is specified.
-
Return type:
-
None
subfolderβ
Subdirectory within the HuggingFace repo to load config and weights from.
trust_remote_codeβ
trust_remote_code: bool
Whether to allow custom modeling files from Hugging Face.
use_subgraphsβ
use_subgraphs: bool
Whether to use subgraphs for the model.
validate_and_resolve_quantization_encoding_weight_path()β
validate_and_resolve_quantization_encoding_weight_path(default_encoding)
Verifies that the quantization encoding and weight path are consistent.
-
Parameters:
-
- weight_path β The path to the weight file.
- default_encoding (max.pipelines.lib.config.SupportedEncoding) β The default encoding to use if no encoding is provided.
-
Return type:
-
None
validate_and_resolve_rope_type()β
validate_and_resolve_rope_type(arch_rope_type)
Resolves rope_type from architecture default if not set.
-
Parameters:
-
arch_rope_type (Literal['none', 'normal', 'neox', 'longrope', 'yarn'])
-
Return type:
-
None
validate_and_resolve_with_resolved_quantization_encoding()β
validate_and_resolve_with_resolved_quantization_encoding(supported_encodings, default_weights_format)
Validates model path and weight path against resolved quantization encoding.
Also finalizes the encoding config.
-
Parameters:
-
- supported_encodings (set[max.pipelines.lib.config.SupportedEncoding]) β A dictionary of supported encodings and their corresponding KV cache strategies.
- default_weights_format (WeightsFormat) β The default weights format to use if no weights format is provided.
-
Return type:
-
None
validate_lora_compatibility()β
validate_lora_compatibility()
Validates that LoRA configuration is compatible with model settings.
-
Raises:
-
ValueError β If LoRA is enabled but incompatible with current model configuration.
-
Return type:
-
None
validate_max_length()β
classmethod validate_max_length(v)
Validate that max_length is non-negative if provided.
validate_multi_gpu_supported()β
validate_multi_gpu_supported(multi_gpu_supported)
Validates that the model architecture supports multi-GPU inference.
-
Parameters:
-
multi_gpu_supported (bool) β Whether the model architecture supports multi-GPU inference.
-
Return type:
-
None
vision_config_overridesβ
Model-specific vision configuration overrides.
weight_pathβ
The path or URL of the model weights to use.
weights_size()β
weights_size()
Calculates the total size in bytes of all weight files in weight_path.
Attempts to find the weights locally first to avoid network calls, checking in the following order:
- If
repo_typeis"local", it checks if the path inweight_pathexists directly as a local file path. - Otherwise, if
repo_typeis"online", it first checks the local Hugging Face cache usinghuggingface_hub.try_to_load_from_cache(). If not found in the cache, it falls back to querying the Hugging Face Hub API viaHuggingFaceRepo.size_of().
-
Returns:
-
The total size of all weight files in bytes.
-
Raises:
-
- FileNotFoundError β If
repo_typeis"local"and a file specified inweight_pathis not found within the local repo directory. - ValueError β If
HuggingFaceRepo.size_of()fails to retrieve the file size from the Hugging Face Hub API (for example, file metadata not available or API error). - RuntimeError β If the determined
repo_typeis unexpected.
- FileNotFoundError β If
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!