Python module
config
Configuration classes for MAX pipelines.
AudioGenerationConfig
class max.pipelines.lib.config.AudioGenerationConfig(audio_decoder, audio_decoder_weights='', chunk_size=None, buffer=0, block_causal=False, prepend_prompt_speech_tokens='never', prepend_prompt_speech_tokens_causal=False, run_model_test_mode=False, prometheus_metrics_mode='instrument_only', *, config_file=None, section_name=None, pipeline_role='prefill_and_decode', max_batch_size=None, max_queue_size_tg=None, min_batch_size_tg=None, ep_size=1, ce_delay_ms=0.0, enable_prioritize_first_decode=False, enable_chunked_prefill=True, enable_in_flight_batching=False, max_num_steps=-1, max_batch_input_tokens=8192, zmq_endpoint_base=<factory>, execute_empty_batches=False, max_batch_total_tokens=None, debug_verify_replay=False, enable_overlap_scheduler=False, prefer_module_v3=False, model=<factory>, draft_model=None, sampling=<factory>, profiling=<factory>, lora=None, speculative=None, runtime=<factory>, audio_decoder_config=<factory>)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
-
Parameters:
-
- audio_decoder (str)
- audio_decoder_weights (str)
- chunk_size (list[int] | None)
- buffer (int)
- block_causal (bool)
- prepend_prompt_speech_tokens (Literal['never', 'once', 'rolling'])
- prepend_prompt_speech_tokens_causal (bool)
- run_model_test_mode (bool)
- prometheus_metrics_mode (Literal['instrument_only', 'launch_server', 'launch_multiproc_server'])
- config_file (str | None)
- section_name (str | None)
- pipeline_role (Literal['prefill_and_decode', 'prefill_only', 'decode_only'])
- max_batch_size (int | None)
- max_queue_size_tg (int | None)
- min_batch_size_tg (int | None)
- ep_size (int)
- ce_delay_ms (float)
- enable_prioritize_first_decode (bool)
- enable_chunked_prefill (bool)
- enable_in_flight_batching (bool)
- max_num_steps (int)
- max_batch_input_tokens (int)
- zmq_endpoint_base (str)
- execute_empty_batches (bool)
- max_batch_total_tokens (int | None)
- debug_verify_replay (bool)
- enable_overlap_scheduler (bool)
- prefer_module_v3 (bool)
- model (MAXModelConfig)
- draft_model (MAXModelConfig | None)
- sampling (SamplingConfig)
- profiling (ProfilingConfig)
- lora (LoRAConfig | None)
- speculative (SpeculativeConfig | None)
- runtime (PipelineRuntimeConfig)
- audio_decoder_config (dict[str, Any])
audio_decoder
audio_decoder: str
audio_decoder_config
audio_decoder_weights
audio_decoder_weights: str
block_causal
block_causal: bool
buffer
buffer: int
chunk_size
from_flags()
classmethod from_flags(audio_flags, **config_flags)
Builds an AudioGenerationConfig from audio CLI flags and config kwargs.
-
Parameters:
-
Return type:
model_config
model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'strict': False}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_post_init()
model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
-
Parameters:
-
- self (BaseModel) – The BaseModel instance.
- context (Any) – The context.
-
Return type:
-
None
prepend_prompt_speech_tokens
prepend_prompt_speech_tokens: PrependPromptSpeechTokens
prepend_prompt_speech_tokens_causal
prepend_prompt_speech_tokens_causal: bool
prometheus_metrics_mode
prometheus_metrics_mode: PrometheusMetricsMode
KVCacheConfig
class max.pipelines.lib.config.KVCacheConfig(*, config_file=None, section_name=None, cache_strategy='model_default', kv_cache_page_size=128, enable_prefix_caching=True, enable_kvcache_swapping_to_host=False, device_memory_utilization=0.9, host_kvcache_swap_space_gb=50.0, kv_cache_format=None, disk_offload_dir=None, disk_offload_max_gb=50.0, disk_offload_direct_io=False, lmcache_config_file=None)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
-
Parameters:
-
- config_file (str | None)
- section_name (str | None)
- cache_strategy (Literal['model_default', 'paged'])
- kv_cache_page_size (int)
- enable_prefix_caching (bool)
- enable_kvcache_swapping_to_host (bool)
- device_memory_utilization (float)
- host_kvcache_swap_space_gb (float)
- kv_cache_format (str | None)
- disk_offload_dir (str | None)
- disk_offload_max_gb (float)
- disk_offload_direct_io (bool)
- lmcache_config_file (str | None)
cache_dtype
property cache_dtype: DType
Returns the data type used for KV cache storage.
cache_strategy
cache_strategy: Literal['model_default', 'paged']
device_memory_utilization
device_memory_utilization: float
disk_offload_dir
disk_offload_direct_io
disk_offload_direct_io: bool
disk_offload_max_gb
disk_offload_max_gb: float
enable_kvcache_swapping_to_host
enable_kvcache_swapping_to_host: bool
enable_prefix_caching
enable_prefix_caching: bool
host_kvcache_swap_space_gb
host_kvcache_swap_space_gb: float
kv_cache_format
kv_cache_page_size
kv_cache_page_size: int
lmcache_config_file
model_config
model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'strict': False}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_post_init()
model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
-
Parameters:
-
- self (BaseModel) – The BaseModel instance.
- context (Any) – The context.
-
Return type:
-
None
to_params()
to_params(dtype, n_kv_heads, head_dim, num_layers, devices, data_parallel_degree=1, is_mla=False, kvcache_quant_config=None)
Return KVCacheParams built from this config.
-
Parameters:
-
- dtype (DType) – Data type for KV cache storage.
- n_kv_heads (int) – Total number of KV heads across all devices.
- head_dim (int) – Dimension of each attention head.
- num_layers (int) – Number of model layers.
- devices (Sequence[DeviceRef]) – Devices that host the KV cache.
- data_parallel_degree (int) – Degree of data parallelism.
- is_mla (bool) – Whether the model uses Multi-Latent Attention.
- kvcache_quant_config (KVCacheQuantizationConfig | None) – KV cache quantization configuration.
-
Returns:
-
The constructed KV cache parameters.
-
Return type:
LoRAConfig
class max.pipelines.lib.config.LoRAConfig(*, config_file=None, section_name=None, enable_lora=False, lora_paths=<factory>, max_lora_rank=16, max_num_loras=1)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
-
Parameters:
enable_lora
enable_lora: bool
lora_paths
max_lora_rank
max_lora_rank: int
max_num_loras
max_num_loras: int
model_config
model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'strict': False}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_post_init()
model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
-
Parameters:
-
- self (BaseModel) – The BaseModel instance.
- context (Any) – The context.
-
Return type:
-
None
MAXModelConfig
class max.pipelines.lib.config.MAXModelConfig(*, config_file=None, section_name=None, use_subgraphs=True, data_parallel_degree=1, pool_embeddings=True, max_length=None, model_path='', served_model_name=None, weight_path=<factory>, quantization_encoding=None, allow_safetensors_weights_fp32_bf6_bidirectional_cast=False, huggingface_model_revision='main', huggingface_weight_revision='main', trust_remote_code=False, device_specs=<factory>, force_download=False, vision_config_overrides=<factory>, rope_type=None, enable_echo=False, chat_template=None, kv_cache=<factory>)
Initialize config, allowing tests/internal callers to seed PrivateAttrs.
Pydantic PrivateAttrs are not regular model fields, so they are not accepted as constructor kwargs by default. Some tests (and debugging utilities) intentionally seed _huggingface_config to avoid network access and to validate config override plumbing. Hence, we need to explicitly define this __init__ method to seed the PrivateAttr(s).
-
Parameters:
-
- config_file (str | None)
- section_name (str | None)
- use_subgraphs (bool)
- data_parallel_degree (int)
- pool_embeddings (bool)
- max_length (int | None)
- model_path (str)
- served_model_name (str | None)
- weight_path (list[Path])
- quantization_encoding (Literal['float32', 'bfloat16', 'q4_k', 'q4_0', 'q6_k', 'float8_e4m3fn', 'float4_e2m1fnx2', 'gptq'] | None)
- allow_safetensors_weights_fp32_bf6_bidirectional_cast (bool)
- huggingface_model_revision (str)
- huggingface_weight_revision (str)
- trust_remote_code (bool)
- device_specs (list[DeviceSpec])
- force_download (bool)
- vision_config_overrides (dict[str, Any])
- rope_type (Literal['none', 'normal', 'neox', 'longrope', 'yarn'] | None)
- enable_echo (bool)
- chat_template (Path | None)
- kv_cache (KVCacheConfig)
allow_safetensors_weights_fp32_bf6_bidirectional_cast
allow_safetensors_weights_fp32_bf6_bidirectional_cast: bool
chat_template
chat_template: Path | None
create_kv_cache_config()
create_kv_cache_config(**kv_cache_kwargs)
Create and set the KV cache configuration with the given parameters.
This method creates a new KVCacheConfig from the provided keyword arguments and automatically sets the cache_dtype based on the model’s quantization encoding (or any explicit override in kv_cache_kwargs).
-
Parameters:
-
**kv_cache_kwargs – Keyword arguments to pass to KVCacheConfig constructor. Common options include:
- cache_strategy: The KV cache strategy (continuous, paged, etc.)
- kv_cache_page_size: Number of tokens per page for paged cache
- enable_prefix_caching: Whether to enable prefix caching
- device_memory_utilization: Fraction of device memory to use
- cache_dtype: Override for the cache data type
-
Return type:
-
None
data_parallel_degree
data_parallel_degree: int
default_device_spec
property default_device_spec: DeviceSpec
Returns the default device spec for the model.
This is the first device spec in the list, used for device spec checks throughout config validation.
-
Returns:
-
The default device spec for the model.
device_specs
device_specs: list[DeviceSpec]
diffusers_config
Retrieve the diffusers config for diffusion pipelines.
Note: For multiprocessing, __getstate__ clears _diffusers_config before pickling. Each worker process will reload the config fresh.
-
Returns:
-
The diffusers config dict if this is a diffusion pipeline, None otherwise. The dict will have a structure with “_class_name” and “components” keys, where each component includes “class_name” and “config_dict” fields.
enable_echo
enable_echo: bool
force_download
force_download: bool
generation_config
property generation_config: GenerationConfig
Retrieve the Hugging Face GenerationConfig for this model.
This property lazily loads the GenerationConfig from the model repository and caches it to avoid repeated remote fetches.
-
Returns:
-
The GenerationConfig for the model, containing generation parameters like max_length, temperature, top_p, etc. If loading fails, returns a default GenerationConfig.
graph_quantization_encoding
property graph_quantization_encoding: QuantizationEncoding | None
Converts the CLI encoding to a MAX Graph quantization encoding.
-
Returns:
-
The graph quantization encoding corresponding to the CLI encoding.
-
Raises:
-
ValueError – If no CLI encoding was specified.
huggingface_config
property huggingface_config: AutoConfig | None
Returns the Hugging Face model config (loaded on first access).
huggingface_model_repo
property huggingface_model_repo: HuggingFaceRepo
Returns the Hugging Face repo handle for the model.
huggingface_model_revision
huggingface_model_revision: str
huggingface_weight_repo
property huggingface_weight_repo: HuggingFaceRepo
Returns the Hugging Face repo handle for weight files.
huggingface_weight_repo_id
property huggingface_weight_repo_id: str
Returns the Hugging Face repo ID used for weight files.
huggingface_weight_revision
huggingface_weight_revision: str
kv_cache
kv_cache: KVCacheConfig
max_length
model_config
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'strict': False}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_name
property model_name: str
Returns the served model name or model path.
model_path
model_path: str
model_post_init()
model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
-
Parameters:
-
- self (BaseModel) – The BaseModel instance.
- context (Any) – The context.
-
Return type:
-
None
pool_embeddings
pool_embeddings: bool
quantization_encoding
quantization_encoding: SupportedEncoding | None
resolve()
resolve()
Validates and resolves the config.
This method is called after the model config is initialized, to ensure that all config fields have been initialized to a valid state. It will also set and update other fields which may not be determined / initialized in the default factory.
In order:
- Resolve chat_template if it’s a Path
- Validate that the device_specs provided are available
- Parse the weight path(s) and initialize the _weights_repo_id
-
Return type:
-
None
retrieve_chat_template()
retrieve_chat_template()
Returns the chat template string, or None if not set.
-
Return type:
-
str | None
rope_type
rope_type: RopeType | None
sampling_params_defaults
property sampling_params_defaults: SamplingParamsGenerationConfigDefaults
Returns sampling defaults derived from the generation config.
served_model_name
set_cache_dtype_given_quantization_encoding()
set_cache_dtype_given_quantization_encoding()
Determine the KV cache dtype based on quantization encoding configuration.
The dtype is determined in the following priority order:
- Explicit override from kv_cache.kv_cache_format (if set)
- Derived from the model’s quantization_encoding
- Falls back to float32 if no encoding is specified
-
Returns:
-
- DType.float32 for float32, q4_k, q4_0, q6_k encodings
- DType.bfloat16 for bfloat16, float8_e4m3fn, float4_e2m1fnx2, gptq encodings
-
Return type:
-
The DType to use for the KV cache. Typical values are
trust_remote_code
trust_remote_code: bool
use_subgraphs
use_subgraphs: bool
validate_and_resolve_quantization_encoding_weight_path()
validate_and_resolve_quantization_encoding_weight_path(default_encoding)
Verifies that the quantization encoding and weight path are consistent.
-
Parameters:
-
- weight_path – The path to the weight file.
- default_encoding (Literal['float32', 'bfloat16', 'q4_k', 'q4_0', 'q6_k', 'float8_e4m3fn', 'float4_e2m1fnx2', 'gptq']) – The default encoding to use if no encoding is provided.
-
Return type:
-
None
validate_and_resolve_rope_type()
validate_and_resolve_rope_type(arch_rope_type)
Resolves rope_type from architecture default if not set.
-
Parameters:
-
arch_rope_type (Literal['none', 'normal', 'neox', 'longrope', 'yarn'])
-
Return type:
-
None
validate_and_resolve_with_resolved_quantization_encoding()
validate_and_resolve_with_resolved_quantization_encoding(supported_encodings, default_weights_format)
Validates model path and weight path against resolved quantization encoding.
Also resolves the KV cache strategy and finalizes the encoding config.
-
Parameters:
-
- supported_encodings (dict[Literal['float32', 'bfloat16', 'q4_k', 'q4_0', 'q6_k', 'float8_e4m3fn', 'float4_e2m1fnx2', 'gptq'], list[~typing.Literal['model_default', 'paged']]]) – A dictionary of supported encodings and their corresponding KV cache strategies.
- default_weights_format (WeightsFormat) – The default weights format to use if no weights format is provided.
-
Return type:
-
None
validate_lora_compatibility()
validate_lora_compatibility()
Validates that LoRA configuration is compatible with model settings.
-
Raises:
-
ValueError – If LoRA is enabled but incompatible with current model configuration.
-
Return type:
-
None
validate_max_length()
classmethod validate_max_length(v)
Validate that max_length is non-negative if provided.
validate_multi_gpu_supported()
validate_multi_gpu_supported(multi_gpu_supported)
Validates that the model architecture supports multi-GPU inference.
-
Parameters:
-
multi_gpu_supported (bool) – Whether the model architecture supports multi-GPU inference.
-
Return type:
-
None
vision_config_overrides
weight_path
weight_path: list[Path]
weights_size()
weights_size()
Calculates the total size in bytes of all weight files in weight_path.
Attempts to find the weights locally first to avoid network calls, checking in the following order:
- If repo_type is
"local", it checks if the path in weight_path exists directly as a local file path. - Otherwise, if repo_type is
"online", it first checks the local Hugging Face cache usinghuggingface_hub.try_to_load_from_cache(). If not found in the cache, it falls back to querying the Hugging Face Hub API viaHuggingFaceRepo.size_of().
-
Returns:
-
The total size of all weight files in bytes.
-
Raises:
-
- FileNotFoundError – If repo_type is
"local"and a file specified in weight_path is not found within the local repo directory. - ValueError – If
HuggingFaceRepo.size_of()fails to retrieve the file size from the Hugging Face Hub API (e.g., file metadata not available or API error). - RuntimeError – If the determined repo_type is unexpected.
- FileNotFoundError – If repo_type is
-
Return type:
MAXModelConfigBase
class max.pipelines.lib.config.MAXModelConfigBase(*, config_file=None, section_name=None)
Abstract base class for all (required) MAX model configs.
This base class is used to configure a model to use for a pipeline, but also handy to sidestep the need to pass in optional fields when subclassing MAXModelConfig.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
model_config
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'strict': False}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
PipelineConfig
class max.pipelines.lib.config.PipelineConfig(*, config_file=None, section_name=None, pipeline_role='prefill_and_decode', max_batch_size=None, max_queue_size_tg=None, min_batch_size_tg=None, ep_size=1, ce_delay_ms=0.0, enable_prioritize_first_decode=False, enable_chunked_prefill=True, enable_in_flight_batching=False, max_num_steps=-1, max_batch_input_tokens=8192, zmq_endpoint_base=<factory>, execute_empty_batches=False, max_batch_total_tokens=None, debug_verify_replay=False, enable_overlap_scheduler=False, prefer_module_v3=False, model=<factory>, draft_model=None, sampling=<factory>, profiling=<factory>, lora=None, speculative=None, runtime=<factory>)
Configuration for a pipeline.
WIP - Once a PipelineConfig is fully initialized, it should be as immutable as possible (frozen=True). All underlying dataclass fields should have been initialized to their default values, be it user specified via some CLI flag, config file, environment variable, or internally set to a reasonable default.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
-
Parameters:
-
- config_file (str | None)
- section_name (str | None)
- pipeline_role (Literal['prefill_and_decode', 'prefill_only', 'decode_only'])
- max_batch_size (int | None)
- max_queue_size_tg (int | None)
- min_batch_size_tg (int | None)
- ep_size (int)
- ce_delay_ms (float)
- enable_prioritize_first_decode (bool)
- enable_chunked_prefill (bool)
- enable_in_flight_batching (bool)
- max_num_steps (int)
- max_batch_input_tokens (int)
- zmq_endpoint_base (str)
- execute_empty_batches (bool)
- max_batch_total_tokens (int | None)
- debug_verify_replay (bool)
- enable_overlap_scheduler (bool)
- prefer_module_v3 (bool)
- model (MAXModelConfig)
- draft_model (MAXModelConfig | None)
- sampling (SamplingConfig)
- profiling (ProfilingConfig)
- lora (LoRAConfig | None)
- speculative (SpeculativeConfig | None)
- runtime (PipelineRuntimeConfig)
ce_delay_ms
ce_delay_ms: float
configure_session()
configure_session(session)
Configure an InferenceSession with standard pipeline settings.
-
Parameters:
-
session (InferenceSession)
-
Return type:
-
None
debug_verify_replay
debug_verify_replay: bool
draft_model
draft_model: MAXModelConfig | None
enable_chunked_prefill
enable_chunked_prefill: bool
enable_in_flight_batching
enable_in_flight_batching: bool
enable_overlap_scheduler
enable_overlap_scheduler: bool
enable_prioritize_first_decode
enable_prioritize_first_decode: bool
ep_size
ep_size: int
execute_empty_batches
execute_empty_batches: bool
graph_quantization_encoding
property graph_quantization_encoding: QuantizationEncoding | None
Converts the CLI encoding to a MAX graph quantization encoding.
-
Returns:
-
The graph quantization encoding corresponding to the CLI encoding.
log_basic_config()
log_basic_config()
Log minimal pipeline configuration information.
Logs basic PipelineConfig options including model name, pipeline task, weight path, max_batch_size, max_seq_len, and reserved memory.
-
Return type:
-
None
log_pipeline_info()
log_pipeline_info()
Logs comprehensive pipeline and KVCache configuration information.
Retrieves all necessary information from self and the PIPELINE_REGISTRY. Raises an error if architecture is not found (which should not happen after config resolution).
-
Return type:
-
None
lora
lora: LoRAConfig | None
max_batch_input_tokens
max_batch_input_tokens: int
max_batch_size
max_batch_total_tokens
max_num_steps
max_num_steps: int
max_queue_size_tg
min_batch_size_tg
model
model: MAXModelConfig
model_config
model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'strict': False}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_post_init()
model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
-
Parameters:
-
- self (BaseModel) – The BaseModel instance.
- context (Any) – The context.
-
Return type:
-
None
pipeline_role
pipeline_role: PipelineRole
prefer_module_v3
prefer_module_v3: bool
profiling
profiling: ProfilingConfig
resolve()
resolve()
Validates and resolves the config.
Called after the config is initialized to ensure all config fields are in a valid state.
-
Return type:
-
None
runtime
runtime: PipelineRuntimeConfig
sampling
sampling: SamplingConfig
speculative
speculative: SpeculativeConfig | None
zmq_endpoint_base
zmq_endpoint_base: str
ProfilingConfig
class max.pipelines.lib.config.ProfilingConfig(*, config_file=None, section_name=None, gpu_profiling='off')
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
-
Parameters:
gpu_profiling
gpu_profiling: GPUProfilingMode
model_config
model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'strict': False}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_post_init()
model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
-
Parameters:
-
- self (BaseModel) – The BaseModel instance.
- context (Any) – The context.
-
Return type:
-
None
SpeculativeConfig
class max.pipelines.lib.config.SpeculativeConfig(*, config_file=None, section_name=None, speculative_method=None, num_speculative_tokens=5)
Configuration for speculative decoding.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
-
Parameters:
is_eagle()
is_eagle()
Returns whether the speculative method is EAGLE (shared embedding/lm_head).
-
Return type:
is_mtp()
is_mtp()
Returns whether the speculative method is MTP.
-
Return type:
is_standalone()
is_standalone()
Returns whether the speculative method is a standalone model.
-
Return type:
model_config
model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'strict': False}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_post_init()
model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
-
Parameters:
-
- self (BaseModel) – The BaseModel instance.
- context (Any) – The context.
-
Return type:
-
None
num_speculative_tokens
num_speculative_tokens: int
speculative_method
speculative_method: SpeculativeMethod | None
is_float4_encoding()
max.pipelines.lib.config.is_float4_encoding(encoding)
Returns whether the given encoding is a float4 type.
parse_supported_encoding_from_file_name()
max.pipelines.lib.config.parse_supported_encoding_from_file_name(name)
Infers a SupportedEncoding from a file name string.
supported_encoding_dtype()
max.pipelines.lib.config.supported_encoding_dtype(encoding)
Returns the underlying model dtype for the given encoding.
supported_encoding_quantization()
max.pipelines.lib.config.supported_encoding_quantization(encoding)
Returns the QuantizationEncoding for the given encoding.
-
Parameters:
-
encoding (Literal['float32', 'bfloat16', 'q4_k', 'q4_0', 'q6_k', 'float8_e4m3fn', 'float4_e2m1fnx2', 'gptq'])
-
Return type:
-
QuantizationEncoding | None
supported_encoding_supported_devices()
max.pipelines.lib.config.supported_encoding_supported_devices(encoding)
Returns the devices that the given encoding is supported on.
supported_encoding_supported_on()
max.pipelines.lib.config.supported_encoding_supported_on(encoding, device_spec)
Returns whether the given encoding is supported on a device.
-
Parameters:
-
- encoding (Literal['float32', 'bfloat16', 'q4_k', 'q4_0', 'q6_k', 'float8_e4m3fn', 'float4_e2m1fnx2', 'gptq'])
- device_spec (DeviceSpec)
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!