Python class
PipelineModelWithKVCache
PipelineModelWithKVCacheโ
class max.pipelines.lib.PipelineModelWithKVCache(pipeline_config, session, devices, kv_cache_config, weights, adapter, return_logits, return_hidden_states=ReturnHiddenStates.NONE)
Bases: PipelineModel[BaseContextType]
A pipeline model that supports KV cache.
-
Parameters:
-
- pipeline_config (PipelineConfig)
- session (InferenceSession)
- devices (list[Device])
- kv_cache_config (KVCacheConfig)
- weights (Weights)
- adapter (WeightsAdapter | None)
- return_logits (ReturnLogits)
- return_hidden_states (ReturnHiddenStates)
get_kv_params()โ
abstract classmethod get_kv_params(huggingface_config, pipeline_config, devices, kv_cache_config, cache_dtype)
Returns the KV cache params for the pipeline model.
-
Parameters:
-
- huggingface_config (AutoConfig)
- pipeline_config (PipelineConfig)
- devices (list[DeviceRef])
- kv_cache_config (KVCacheConfig)
- cache_dtype (DType)
-
Return type:
kv_paramsโ
kv_params: KVCacheParamInterface
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!