Python module
model_config
MAX model config classes.
MAXModelConfig
class max.pipelines.lib.model_config.MAXModelConfig(model_path='', model='', served_model_name=None, weight_path=<factory>, quantization_encoding=None, allow_safetensors_weights_fp32_bf6_bidirectional_cast=False, huggingface_model_revision='main', huggingface_weight_revision='main', trust_remote_code=False, device_specs=<factory>, force_download=False, vision_config_overrides=<factory>, rope_type=None, use_subgraphs=True, data_parallel_degree=1, _applied_dtype_cast_from=None, _applied_dtype_cast_to=None, _huggingface_config=None, _weights_repo_id=None, _quant_config=None, _kv_cache_config=<factory>, _config_file_section_name='model_config')
Bases: MAXModelConfigBase
Abstract base class for all MAX model configs.
This class is used to configure a model to use for a pipeline.
-
Parameters:
-
- model_path (str)
- model (str)
- served_model_name (str | None)
- weight_path (list[Path])
- quantization_encoding (SupportedEncoding | None)
- allow_safetensors_weights_fp32_bf6_bidirectional_cast (bool)
- huggingface_model_revision (str)
- huggingface_weight_revision (str)
- trust_remote_code (bool)
- device_specs (list[DeviceSpec])
- force_download (bool)
- vision_config_overrides (dict[str, Any])
- rope_type (RopeType | None)
- use_subgraphs (bool)
- data_parallel_degree (int)
- _applied_dtype_cast_from (SupportedEncoding | None)
- _applied_dtype_cast_to (SupportedEncoding | None)
- _huggingface_config (AutoConfig | None)
- _weights_repo_id (str | None)
- _quant_config (QuantizationConfig | None)
- _kv_cache_config (KVCacheConfig)
- _config_file_section_name (str)
allow_safetensors_weights_fp32_bf6_bidirectional_cast
allow_safetensors_weights_fp32_bf6_bidirectional_cast: bool = False
Whether to allow automatic float32 to/from bfloat16 safetensors weight type casting, if needed. Currently only supported in Llama3 models.
data_parallel_degree
data_parallel_degree: int = 1
Data-parallelism parameter. The degree to which the model is replicated is dependent on the model type.
default_device_spec
property default_device_spec: DeviceSpec
Returns the default device spec for the model. This is the first device spec in the list and is mostly used for device spec checks throughout config validation.
-
Returns:
-
The default device spec for the model.
device_specs
device_specs: list[DeviceSpec]
Devices to run inference upon. This option is not documented in help() as it shouldn’t be used directly via the CLI entrypoint.
force_download
force_download: bool = False
Whether to force download a given file if it’s already present in the local cache.
generation_config
property generation_config: GenerationConfig
Retrieve the HuggingFace GenerationConfig for this model.
This property lazily loads the GenerationConfig from the model repository and caches it to avoid repeated remote fetches.
-
Returns:
-
The GenerationConfig for the model, containing generation parameters like max_length, temperature, top_p, etc. If loading fails, returns a default GenerationConfig.
graph_quantization_encoding
property graph_quantization_encoding: QuantizationEncoding | None
Converts the CLI encoding to a MAX Graph quantization encoding.
-
Returns:
-
The graph quantization encoding corresponding to the CLI encoding.
-
Raises:
-
ValueError – If no CLI encoding was specified.
help()
static help()
Documentation for this config class. Return a dictionary of config options and their descriptions.
huggingface_config
property huggingface_config: AutoConfig
huggingface_model_repo
property huggingface_model_repo: HuggingFaceRepo
huggingface_model_revision
huggingface_model_revision: str = 'main'
Branch or Git revision of Hugging Face model repository to use.
huggingface_weight_repo
property huggingface_weight_repo: HuggingFaceRepo
huggingface_weight_repo_id
property huggingface_weight_repo_id: str
huggingface_weight_revision
huggingface_weight_revision: str = 'main'
Branch or Git revision of Hugging Face model repository to use.
kv_cache_config
property kv_cache_config: KVCacheConfig
model
model: str = ''
repo_id of a Hugging Face model repository to use.
The only entrypoint for this model attribute is via –model max cli flag. Everything under the hood
after this MAXModelConfig is initialized should be handled via model_path, for now.
See post_init for more details on how this is done.
model_name
property model_name: str
model_path
model_path: str = ''
repo_id of a Hugging Face model repository to use. This is functionally equivalent to model flag.
quantization_encoding
quantization_encoding: SupportedEncoding | None = None
Weight encoding type.
resolve()
resolve()
Validates and resolves the config.
This method is called after the model config is initialized, to ensure that all config fields have been initialized to a valid state. It will also set and update other fields which may not be determined / initialized in the default factory.
In order:
- Validate that the device_specs provided are available
- Parse the weight path(s) and initialize the _weights_repo_id
-
Return type:
-
None
rope_type
rope_type: RopeType | None = None
none | normal | neox. Only matters for GGUF weights.
-
Type:
-
Force using a specific rope type
sampling_params_defaults
property sampling_params_defaults: SamplingParamsGenerationConfigDefaults
served_model_name
Optional override for client-facing model name. Defaults to model_path.
trust_remote_code
trust_remote_code: bool = False
Whether or not to allow for custom modelling files on Hugging Face.
use_subgraphs
use_subgraphs: bool = True
Whether to use subgraphs for the model. This could significantly reduce compile time especially for a large model with several identical blocks. Default is true.
validate_and_resolve_quantization_encoding_weight_path()
validate_and_resolve_quantization_encoding_weight_path(default_encoding)
Verifies that the quantization encoding and weight path provided are consistent.
-
Parameters:
-
- weight_path – The path to the weight file.
- default_encoding (SupportedEncoding) – The default encoding to use if no encoding is provided.
-
Return type:
-
None
validate_and_resolve_rope_type()
validate_and_resolve_rope_type(arch_rope_type)
-
Parameters:
-
arch_rope_type (RopeType)
-
Return type:
-
None
validate_and_resolve_with_resolved_quantization_encoding()
validate_and_resolve_with_resolved_quantization_encoding(supported_encodings, default_weights_format)
Validates that the model path, and weight path provided are consistent with a resolved quantization encoding. Also resolves the KV cache strategy and finalizes the encoding config.
-
Parameters:
-
- supported_encodings (dict[SupportedEncoding, list[KVCacheStrategy]]) – A dictionary of supported encodings and their corresponding KV cache strategies.
- default_weights_format (WeightsFormat) – The default weights format to use if no weights format is provided.
-
Return type:
-
None
validate_lora_compatibility()
validate_lora_compatibility()
Validates that LoRA configuration is compatible with model settings.
-
Raises:
-
ValueError – If LoRA is enabled but incompatible with current model configuration.
-
Return type:
-
None
validate_multi_gpu_supported()
validate_multi_gpu_supported(multi_gpu_supported)
Validates that the model architecture supports multi-GPU inference.
-
Parameters:
-
multi_gpu_supported (bool) – Whether the model architecture supports multi-GPU inference.
-
Return type:
-
None
vision_config_overrides
24}
-
Type:
-
Model-specific vision configuration overrides. For example, for InternVL
-
Type:
-
{“max_dynamic_patch”
weight_path
Optional path or url of the model weights to use.
weights_size()
weights_size()
Calculates the total size in bytes of all weight files specified in weight_path.
This method attempts to find the weights locally first to avoid network calls, checking in the following order:
- If repo_type is
RepoType.local, it checks if the path in weight_path exists directly as a local file path. - Otherwise, if repo_type is
RepoType.online, it first checks the local Hugging Face cache usinghuggingface_hub.try_to_load_from_cache(). If not found in the cache, it falls back to querying the Hugging Face Hub API viaHuggingFaceRepo.size_of().
-
Returns:
-
The total size of all weight files in bytes.
-
Raises:
-
- FileNotFoundError – If repo_type is
RepoType.localand a file specified in weight_path is not found within the local repo directory. - ValueError – If
HuggingFaceRepo.size_of()fails to retrieve the file size from the Hugging Face Hub API (e.g., file metadata not available or API error). - RuntimeError – If the determined repo_type is unexpected.
- FileNotFoundError – If repo_type is
-
Return type:
MAXModelConfigBase
class max.pipelines.lib.model_config.MAXModelConfigBase
Bases: MAXConfig
Abstract base class for all (required) MAX model configs.
This base class is used to configure a model to use for a pipeline, but also handy to sidestep the need to pass in optional fields when subclassing MAXModelConfig.
help()
static help()
Documentation for this config class. Return a dictionary of config options and their descriptions.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!