Skip to main content

Python function

compute_num_device_blocks

compute_num_device_blocks()​

max.nn.kv_cache.compute_num_device_blocks(params, available_cache_memory, max_batch_size, max_seq_len)

source

Computes the number of blocks that can be allocated based on the available cache memory.

The number of blocks returned is for a single replica. Each replica will have the same number of blocks.

Parameters:

  • available_cache_memory (int) – The amount of cache memory available across all devices.
  • max_batch_size (int | None) – The maximum batch size, or None.
  • max_seq_len (int | None) – The maximum sequence length, or None.
  • params (KVCacheParamInterface)

Returns:

The number of blocks that can be allocated for a single replica.

Return type:

int