Python function
compute_num_device_blocks
compute_num_device_blocks()β
max.nn.kv_cache.compute_num_device_blocks(params, available_cache_memory, max_batch_size, max_seq_len)
Computes the number of blocks that can be allocated based on the available cache memory.
The number of blocks returned is for a single replica. Each replica will have the same number of blocks.
-
Parameters:
-
- available_cache_memory (int) β The amount of cache memory available across all devices.
- max_batch_size (int | None) β The maximum batch size, or None.
- max_seq_len (int | None) β The maximum sequence length, or None.
- params (KVCacheParamInterface)
-
Returns:
-
The number of blocks that can be allocated for a single replica.
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!