IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python function

estimated_memory_size

estimated_memory_size()​

max.nn.kv_cache.estimated_memory_size(params, available_cache_memory, max_batch_size, max_seq_len)

source

Computes the estimated memory size of the KV cache used by all replicas.

Parameters:

  • available_cache_memory (int) – The amount of cache memory available across all devices.
  • max_batch_size (int) – The maximum batch size.
  • max_seq_len (int) – The maximum sequence length.
  • params (KVCacheParamInterface)

Returns:

The estimated memory usage of the KV cache in bytes.

Return type:

int