Python function
compute_max_seq_len_fitting_in_cache
compute_max_seq_len_fitting_in_cache()
max.nn.kv_cache.compute_max_seq_len_fitting_in_cache(params, available_cache_memory)
Computes the maximum sequence length that can fit in the available memory.
-
Parameters:
-
- available_cache_memory (int) – The amount of cache memory available across
- devices. (all)
- params (KVCacheParamInterface)
-
Returns:
-
The maximum sequence length that can fit in the available cache memory.
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!