Python function
load_kv_manager
load_kv_manager()
max.kv_cache.load_kv_manager(params, max_batch_size, max_seq_len, session, available_cache_memory)
Loads a KV cache manager from the given params.
Accepts both KVCacheParams (single cache) and MultiKVCacheParams
(multiple caches). The returned PagedKVCacheManager natively handles
all caches with a single BlockManager and KVConnector.
-
Parameters:
-
- params (KVCacheParamInterface)
- max_batch_size (int | None)
- max_seq_len (int)
- session (InferenceSession)
- available_cache_memory (int | None)
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!