For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python function

estimated_memory_size

`estimated_memory_size()`

max.nn.kv_cache.estimated_memory_size(params, available_cache_memory, max_batch_size, max_seq_len)

source

Computes the estimated memory size of the KV cache used by all replicas.

Parameters:

available_cache_memory (int) – The amount of cache memory available across all devices.
max_batch_size (int) – The maximum batch size.
max_seq_len (int) – The maximum sequence length.
params (KVCacheParamInterface)

Returns:

The estimated memory usage of the KV cache in bytes.

Return type:

int

estimated_memory_size()​

`estimated_memory_size()`