Python module
registry
estimate_kv_cache_size()β
max.kv_cache.registry.estimate_kv_cache_size(params, max_batch_size, max_seq_len, available_cache_memory)
-
Parameters:
-
- params (KVCacheParamInterface)
- max_batch_size (int)
- max_seq_len (int)
- available_cache_memory (int)
-
Return type:
infer_optimal_batch_size()β
max.kv_cache.registry.infer_optimal_batch_size(params, max_seq_len, available_cache_memory, devices, **kwargs)
load_kv_manager()β
max.kv_cache.registry.load_kv_manager(params, max_batch_size, max_seq_len, session, available_cache_memory)
Loads a single KV cache manager from the given params.
-
Parameters:
-
- params (KVCacheParamInterface)
- max_batch_size (int)
- max_seq_len (int)
- session (InferenceSession)
- available_cache_memory (int)
-
Return type:
load_kv_managers()β
max.kv_cache.registry.load_kv_managers(params, max_batch_size, max_seq_len, session, available_cache_memory)
Loads (potentially multiple) KV cache managers from the given params.
-
Parameters:
-
- params (KVCacheParamInterface)
- max_batch_size (int)
- max_seq_len (int)
- session (InferenceSession)
- available_cache_memory (int)
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!