IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python function

compute_num_device_blocks

compute_num_device_blocks()​

max.nn.kv_cache.compute_num_device_blocks(params, available_cache_memory, max_batch_size, max_seq_len)

source

Computes the number of blocks that can be allocated based on the available cache memory.

The number of blocks returned is for a single replica. Each replica will have the same number of blocks.

Parameters:

  • available_cache_memory (int) – The amount of cache memory available across all devices.
  • max_batch_size (int | None) – The maximum batch size, or None.
  • max_seq_len (int | None) – The maximum sequence length, or None.
  • params (KVCacheParamInterface)

Returns:

The number of blocks that can be allocated for a single replica.

Return type:

int