Python module

model_config

MAX model config classes.

`MAXModelConfig`

class max.pipelines.lib.model_config.MAXModelConfig(model_path='', model='', served_model_name=None, weight_path=<factory>, quantization_encoding=None, allow_safetensors_weights_fp32_bf6_bidirectional_cast=False, huggingface_model_revision='main', huggingface_weight_revision='main', trust_remote_code=False, device_specs=<factory>, force_download=False, vision_config_overrides=<factory>, rope_type=None, use_subgraphs=True, data_parallel_degree=1, _applied_dtype_cast_from=None, _applied_dtype_cast_to=None, _huggingface_config=None, _weights_repo_id=None, _quant_config=None, _kv_cache_config=<factory>, _config_file_section_name='model_config')

Bases: MAXModelConfigBase

Abstract base class for all MAX model configs.

This class is used to configure a model to use for a pipeline.

Parameters:

model_path (str)
model (str)
served_model_name (str | None)
weight_path (list[Path])
quantization_encoding (SupportedEncoding | None)
allow_safetensors_weights_fp32_bf6_bidirectional_cast (bool)
huggingface_model_revision (str)
huggingface_weight_revision (str)
trust_remote_code (bool)
device_specs (list[DeviceSpec])
force_download (bool)
vision_config_overrides (dict[str, Any])
rope_type (RopeType | None)
use_subgraphs (bool)
data_parallel_degree (int)
_applied_dtype_cast_from (SupportedEncoding | None)
_applied_dtype_cast_to (SupportedEncoding | None)
_huggingface_config (AutoConfig | None)
_weights_repo_id (str | None)
_quant_config (QuantizationConfig | None)
_kv_cache_config (KVCacheConfig)
_config_file_section_name (str)

`allow_safetensors_weights_fp32_bf6_bidirectional_cast`

allow_safetensors_weights_fp32_bf6_bidirectional_cast: bool = False

Whether to allow automatic float32 to/from bfloat16 safetensors weight type casting, if needed. Currently only supported in Llama3 models.

`data_parallel_degree`

data_parallel_degree: int = 1

Data-parallelism parameter. The degree to which the model is replicated is dependent on the model type.

`default_device_spec`

property default_device_spec: DeviceSpec

Returns the default device spec for the model. This is the first device spec in the list and is mostly used for device spec checks throughout config validation.

Returns:: The default device spec for the model.

`device_specs`

device_specs: list[DeviceSpec]

Devices to run inference upon. This option is not documented in help() as it shouldn’t be used directly via the CLI entrypoint.

`force_download`

force_download: bool = False

Whether to force download a given file if it’s already present in the local cache.

`generation_config`

property generation_config: GenerationConfig

Retrieve the HuggingFace GenerationConfig for this model.

This property lazily loads the GenerationConfig from the model repository and caches it to avoid repeated remote fetches.

Returns:: The GenerationConfig for the model, containing generation parameters like max_length, temperature, top_p, etc. If loading fails, returns a default GenerationConfig.

`graph_quantization_encoding`

property graph_quantization_encoding: QuantizationEncoding | None

Converts the CLI encoding to a MAX Graph quantization encoding.

Returns:: The graph quantization encoding corresponding to the CLI encoding.
Raises:: ValueError – If no CLI encoding was specified.

`help()`

static help()

Documentation for this config class. Return a dictionary of config options and their descriptions.

Return type:: dict[str, str]

`huggingface_config`

property huggingface_config: AutoConfig

`huggingface_model_repo`

property huggingface_model_repo: HuggingFaceRepo

`huggingface_model_revision`

huggingface_model_revision: str = 'main'

Branch or Git revision of Hugging Face model repository to use.

`huggingface_weight_repo`

property huggingface_weight_repo: HuggingFaceRepo

`huggingface_weight_repo_id`

property huggingface_weight_repo_id: str

`huggingface_weight_revision`

huggingface_weight_revision: str = 'main'

Branch or Git revision of Hugging Face model repository to use.

`kv_cache_config`

property kv_cache_config: KVCacheConfig

`model`

model: str = ''

repo_id of a Hugging Face model repository to use. The only entrypoint for this model attribute is via –model max cli flag. Everything under the hood after this MAXModelConfig is initialized should be handled via model_path, for now. See post_init for more details on how this is done.

`model_name`

property model_name: str

`model_path`

model_path: str = ''

repo_id of a Hugging Face model repository to use. This is functionally equivalent to model flag.

`quantization_encoding`

quantization_encoding: SupportedEncoding | None = None

Weight encoding type.

`resolve()`

resolve()

Validates and resolves the config.

This method is called after the model config is initialized, to ensure that all config fields have been initialized to a valid state. It will also set and update other fields which may not be determined / initialized in the default factory.

In order:

Validate that the device_specs provided are available
Parse the weight path(s) and initialize the _weights_repo_id

Return type:: None

`rope_type`

rope_type: RopeType | None = None

none | normal | neox. Only matters for GGUF weights.

Type:: Force using a specific rope type

`sampling_params_defaults`

property sampling_params_defaults: SamplingParamsGenerationConfigDefaults

`served_model_name`

served_model_name: str | None = None

Optional override for client-facing model name. Defaults to model_path.

`trust_remote_code`

trust_remote_code: bool = False

Whether or not to allow for custom modelling files on Hugging Face.

`use_subgraphs`

use_subgraphs: bool = True

Whether to use subgraphs for the model. This could significantly reduce compile time especially for a large model with several identical blocks. Default is true.

`validate_and_resolve_quantization_encoding_weight_path()`

validate_and_resolve_quantization_encoding_weight_path(default_encoding)

Verifies that the quantization encoding and weight path provided are consistent.

Parameters:

weight_path – The path to the weight file.
default_encoding (SupportedEncoding) – The default encoding to use if no encoding is provided.

Return type:

None

`validate_and_resolve_rope_type()`

validate_and_resolve_rope_type(arch_rope_type)

Parameters:: arch_rope_type (RopeType)
Return type:: None

`validate_and_resolve_with_resolved_quantization_encoding()`

validate_and_resolve_with_resolved_quantization_encoding(supported_encodings, default_weights_format)

Validates that the model path, and weight path provided are consistent with a resolved quantization encoding. Also resolves the KV cache strategy and finalizes the encoding config.

Parameters:

supported_encodings (dict[SupportedEncoding, list[KVCacheStrategy]]) – A dictionary of supported encodings and their corresponding KV cache strategies.
default_weights_format (WeightsFormat) – The default weights format to use if no weights format is provided.

Return type:

None

`validate_lora_compatibility()`

validate_lora_compatibility()

Validates that LoRA configuration is compatible with model settings.

Raises:: ValueError – If LoRA is enabled but incompatible with current model configuration.
Return type:: None

`validate_multi_gpu_supported()`

validate_multi_gpu_supported(multi_gpu_supported)

Validates that the model architecture supports multi-GPU inference.

Parameters:: multi_gpu_supported (bool) – Whether the model architecture supports multi-GPU inference.
Return type:: None

`vision_config_overrides`

vision_config_overrides: dict[str, Any]

24}

Type:: Model-specific vision configuration overrides. For example, for InternVL
Type:: {“max_dynamic_patch”

`weight_path`

weight_path: list[Path]

Optional path or url of the model weights to use.

`weights_size()`

weights_size()

Calculates the total size in bytes of all weight files specified in weight_path.

This method attempts to find the weights locally first to avoid network calls, checking in the following order:

If repo_type is RepoType.local, it checks if the path in weight_path exists directly as a local file path.
Otherwise, if repo_type is RepoType.online, it first checks the local Hugging Face cache using huggingface_hub.try_to_load_from_cache(). If not found in the cache, it falls back to querying the Hugging Face Hub API via HuggingFaceRepo.size_of().

Returns:

The total size of all weight files in bytes.

Raises:

FileNotFoundError – If repo_type is RepoType.local and a file specified in weight_path is not found within the local repo directory.
ValueError – If HuggingFaceRepo.size_of() fails to retrieve the file size from the Hugging Face Hub API (e.g., file metadata not available or API error).
RuntimeError – If the determined repo_type is unexpected.

Return type:

int

`MAXModelConfigBase`

class max.pipelines.lib.model_config.MAXModelConfigBase

Bases: MAXConfig

Abstract base class for all (required) MAX model configs.

This base class is used to configure a model to use for a pipeline, but also handy to sidestep the need to pass in optional fields when subclassing MAXModelConfig.

`help()`

static help()

Documentation for this config class. Return a dictionary of config options and their descriptions.

Return type:: dict[str, str]

MAXModelConfig​

allow_safetensors_weights_fp32_bf6_bidirectional_cast​

data_parallel_degree​

default_device_spec​

device_specs​

force_download​

generation_config​

graph_quantization_encoding​

help()​

huggingface_config​

huggingface_model_repo​

huggingface_model_revision​

huggingface_weight_repo​

huggingface_weight_repo_id​

huggingface_weight_revision​

kv_cache_config​

model​

model_name​

model_path​

quantization_encoding​

resolve()​

rope_type​

sampling_params_defaults​

served_model_name​

trust_remote_code​

use_subgraphs​

validate_and_resolve_quantization_encoding_weight_path()​

validate_and_resolve_rope_type()​

validate_and_resolve_with_resolved_quantization_encoding()​

validate_lora_compatibility()​

validate_multi_gpu_supported()​

vision_config_overrides​

weight_path​

weights_size()​

MAXModelConfigBase​

help()​

`MAXModelConfig`

`allow_safetensors_weights_fp32_bf6_bidirectional_cast`

`data_parallel_degree`

`default_device_spec`

`device_specs`

`force_download`

`generation_config`

`graph_quantization_encoding`

`help()`

`huggingface_config`

`huggingface_model_repo`

`huggingface_model_revision`

`huggingface_weight_repo`

`huggingface_weight_repo_id`

`huggingface_weight_revision`

`kv_cache_config`

`model`

`model_name`

`model_path`

`quantization_encoding`

`resolve()`

`rope_type`

`sampling_params_defaults`

`served_model_name`

`trust_remote_code`

`use_subgraphs`

`validate_and_resolve_quantization_encoding_weight_path()`

`validate_and_resolve_rope_type()`

`validate_and_resolve_with_resolved_quantization_encoding()`

`validate_lora_compatibility()`

`validate_multi_gpu_supported()`

`vision_config_overrides`

`weight_path`

`weights_size()`

`MAXModelConfigBase`

`help()`