Python module
config
Standardized config for Pipeline Inference.
PipelineConfig
class max.pipelines.config.PipelineConfig(engine: Optional[max.pipelines.config.PipelineEngine] = None, architecture: Optional[str] = None, version: Optional[str] = None, weight_path: list[pathlib.Path] = <factory>, huggingface_repo_id: Optional[str] = None, device_spec: max.driver.driver.DeviceSpec = DeviceSpec(id=-1, device_type='cpu'), quantization_encoding: Optional[max.pipelines.config.SupportedEncoding] = None, serialized_model_path: Optional[str] = None, save_to_serialized_model_path: Optional[str] = None, max_length: int = 512, max_new_tokens: int = -1, max_cache_batch_size: int = 1, max_ce_batch_size: int = 32, cache_strategy: max.pipelines.kv_cache.cache_params.KVCacheStrategy = continuous, max_num_steps: int = 1, pad_to_multiple_of: int = 2, top_k: Optional[int] = None, trust_remote_code: bool = False, force_download: bool = False, _huggingface_config: Optional[transformers.models.auto.configuration_auto.AutoConfig] = None, _device: Optional[max.driver.driver.Device] = None, _weights_converter: Optional[type[max.graph.weights.weights.WeightsConverter]] = None, enable_echo: bool = False)
architecture
Model architecture to run.
cache_strategy
cache_strategy*: KVCacheStrategy* = 'continuous'
Force using a specific cache strategy, ‘naive’ or ‘continuous’.
device
property device*: Device*
Initialize and return a device, given the provided device spec.
device_spec
device_spec*: DeviceSpec* = DeviceSpec(id=-1, device_type='cpu')
Device to run inference upon.
download_weights()
download_weights() → None