Skip to main content

Python module

model_config

MAX model config classes.

MAXModelConfig

class max.pipelines.lib.model_config.MAXModelConfig(model_path='', model='', served_model_name=None, weight_path=<factory>, quantization_encoding=None, allow_safetensors_weights_fp32_bf6_bidirectional_cast=False, huggingface_model_revision='main', huggingface_weight_revision='main', trust_remote_code=False, device_specs=<factory>, force_download=False, vision_config_overrides=<factory>, rope_type=None, use_subgraphs=True, data_parallel_degree=1, _applied_dtype_cast_from=None, _applied_dtype_cast_to=None, _huggingface_config=None, _weights_repo_id=None, _quant_config=None, _kv_cache_config=<factory>, _config_file_section_name='model_config')

Bases: MAXModelConfigBase

Abstract base class for all MAX model configs.

This class is used to configure a model to use for a pipeline.

Parameters:

  • model_path (str)
  • model (str)
  • served_model_name (str | None)
  • weight_path (list[Path])
  • quantization_encoding (SupportedEncoding | None)
  • allow_safetensors_weights_fp32_bf6_bidirectional_cast (bool)
  • huggingface_model_revision (str)
  • huggingface_weight_revision (str)
  • trust_remote_code (bool)
  • device_specs (list[DeviceSpec])
  • force_download (bool)
  • vision_config_overrides (dict[str, Any])
  • rope_type (RopeType | None)
  • use_subgraphs (bool)
  • data_parallel_degree (int)
  • _applied_dtype_cast_from (SupportedEncoding | None)
  • _applied_dtype_cast_to (SupportedEncoding | None)
  • _huggingface_config (AutoConfig | None)
  • _weights_repo_id (str | None)
  • _quant_config (QuantizationConfig | None)
  • _kv_cache_config (KVCacheConfig)
  • _config_file_section_name (str)

allow_safetensors_weights_fp32_bf6_bidirectional_cast

allow_safetensors_weights_fp32_bf6_bidirectional_cast: bool = False

Whether to allow automatic float32 to/from bfloat16 safetensors weight type casting, if needed. Currently only supported in Llama3 models.

data_parallel_degree

data_parallel_degree: int = 1

Data-parallelism parameter. The degree to which the model is replicated is dependent on the model type.

default_device_spec

property default_device_spec: DeviceSpec

Returns the default device spec for the model. This is the first device spec in the list and is mostly used for device spec checks throughout config validation.

Returns:

The default device spec for the model.

device_specs

device_specs: list[DeviceSpec]

Devices to run inference upon. This option is not documented in help() as it shouldn’t be used directly via the CLI entrypoint.

force_download

force_download: bool = False

Whether to force download a given file if it’s already present in the local cache.

generation_config

property generation_config: GenerationConfig

Retrieve the HuggingFace GenerationConfig for this model.

This property lazily loads the GenerationConfig from the model repository and caches it to avoid repeated remote fetches.

Returns:

The GenerationConfig for the model, containing generation parameters like max_length, temperature, top_p, etc. If loading fails, returns a default GenerationConfig.

graph_quantization_encoding

property graph_quantization_encoding: QuantizationEncoding | None

Converts the CLI encoding to a MAX Graph quantization encoding.

Returns:

The graph quantization encoding corresponding to the CLI encoding.

Raises:

ValueError – If no CLI encoding was specified.

help()

static help()

Documentation for this config class. Return a dictionary of config options and their descriptions.

Return type:

dict[str, str]

huggingface_config

property huggingface_config: AutoConfig

huggingface_model_repo

property huggingface_model_repo: HuggingFaceRepo

huggingface_model_revision

huggingface_model_revision: str = 'main'

Branch or Git revision of Hugging Face model repository to use.

huggingface_weight_repo

property huggingface_weight_repo: HuggingFaceRepo

huggingface_weight_repo_id

property huggingface_weight_repo_id: str

huggingface_weight_revision

huggingface_weight_revision: str = 'main'

Branch or Git revision of Hugging Face model repository to use.

kv_cache_config

property kv_cache_config: KVCacheConfig

model

model: str = ''

repo_id of a Hugging Face model repository to use. The only entrypoint for this model attribute is via –model max cli flag. Everything under the hood after this MAXModelConfig is initialized should be handled via model_path, for now. See post_init for more details on how this is done.

model_name

property model_name: str

model_path

model_path: str = ''

repo_id of a Hugging Face model repository to use. This is functionally equivalent to model flag.

quantization_encoding

quantization_encoding: SupportedEncoding | None = None

Weight encoding type.

resolve()

resolve()

Validates and resolves the config.

This method is called after the model config is initialized, to ensure that all config fields have been initialized to a valid state. It will also set and update other fields which may not be determined / initialized in the default factory.

In order:

  1. Validate that the device_specs provided are available
  2. Parse the weight path(s) and initialize the _weights_repo_id

Return type:

None

rope_type

rope_type: RopeType | None = None

none | normal | neox. Only matters for GGUF weights.

Type:

Force using a specific rope type

sampling_params_defaults

property sampling_params_defaults: SamplingParamsGenerationConfigDefaults

served_model_name

served_model_name: str | None = None

Optional override for client-facing model name. Defaults to model_path.

trust_remote_code

trust_remote_code: bool = False

Whether or not to allow for custom modelling files on Hugging Face.

use_subgraphs

use_subgraphs: bool = True

Whether to use subgraphs for the model. This could significantly reduce compile time especially for a large model with several identical blocks. Default is true.

validate_and_resolve_quantization_encoding_weight_path()

validate_and_resolve_quantization_encoding_weight_path(default_encoding)

Verifies that the quantization encoding and weight path provided are consistent.

Parameters:

  • weight_path – The path to the weight file.
  • default_encoding (SupportedEncoding) – The default encoding to use if no encoding is provided.

Return type:

None

validate_and_resolve_rope_type()

validate_and_resolve_rope_type(arch_rope_type)

Parameters:

arch_rope_type (RopeType)

Return type:

None

validate_and_resolve_with_resolved_quantization_encoding()

validate_and_resolve_with_resolved_quantization_encoding(supported_encodings, default_weights_format)

Validates that the model path, and weight path provided are consistent with a resolved quantization encoding. Also resolves the KV cache strategy and finalizes the encoding config.

Parameters:

  • supported_encodings (dict[SupportedEncoding, list[KVCacheStrategy]]) – A dictionary of supported encodings and their corresponding KV cache strategies.
  • default_weights_format (WeightsFormat) – The default weights format to use if no weights format is provided.

Return type:

None

validate_lora_compatibility()

validate_lora_compatibility()

Validates that LoRA configuration is compatible with model settings.

Raises:

ValueError – If LoRA is enabled but incompatible with current model configuration.

Return type:

None

validate_multi_gpu_supported()

validate_multi_gpu_supported(multi_gpu_supported)

Validates that the model architecture supports multi-GPU inference.

Parameters:

multi_gpu_supported (bool) – Whether the model architecture supports multi-GPU inference.

Return type:

None

vision_config_overrides

vision_config_overrides: dict[str, Any]

24}

Type:

Model-specific vision configuration overrides. For example, for InternVL

Type:

{“max_dynamic_patch”

weight_path

weight_path: list[Path]

Optional path or url of the model weights to use.

weights_size()

weights_size()

Calculates the total size in bytes of all weight files specified in weight_path.

This method attempts to find the weights locally first to avoid network calls, checking in the following order:

  1. If repo_type is RepoType.local, it checks if the path in weight_path exists directly as a local file path.
  2. Otherwise, if repo_type is RepoType.online, it first checks the local Hugging Face cache using huggingface_hub.try_to_load_from_cache(). If not found in the cache, it falls back to querying the Hugging Face Hub API via HuggingFaceRepo.size_of().

Returns:

The total size of all weight files in bytes.

Raises:

  • FileNotFoundError – If repo_type is RepoType.local and a file specified in weight_path is not found within the local repo directory.
  • ValueError – If HuggingFaceRepo.size_of() fails to retrieve the file size from the Hugging Face Hub API (e.g., file metadata not available or API error).
  • RuntimeError – If the determined repo_type is unexpected.

Return type:

int

MAXModelConfigBase

class max.pipelines.lib.model_config.MAXModelConfigBase

Bases: MAXConfig

Abstract base class for all (required) MAX model configs.

This base class is used to configure a model to use for a pipeline, but also handy to sidestep the need to pass in optional fields when subclassing MAXModelConfig.

help()

static help()

Documentation for this config class. Return a dictionary of config options and their descriptions.

Return type:

dict[str, str]

Was this page helpful?