For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
MultiKVCacheParams
MultiKVCacheParamsβ
class max.nn.kv_cache.MultiKVCacheParams(params, page_size, data_parallel_degree, n_devices, kv_connector, host_kvcache_swap_space_gb, speculative_method=None, num_draft_tokens=0)
Bases: KVCacheParamInterface
Aggregates multiple KV cache parameter sets.
This class implements KVCacheParamInterface by aggregating multiple KVCacheParamInterface instances. Useful for models with multiple distinct KV caches (e.g., different cache configurations for different layers).
-
Parameters:
-
- params (Sequence[KVCacheParams])
- page_size (int)
- data_parallel_degree (int)
- n_devices (int)
- kv_connector (KVConnectorType | None)
- host_kvcache_swap_space_gb (float | None)
- speculative_method (Literal['eagle', 'mtp', 'dflash'] | None)
- num_draft_tokens (int)
bytes_per_blockβ
property bytes_per_block: int
Total bytes per block across all KV caches.
Since all caches allocate memory for the same sequence, the total memory cost per block is the sum across all param sets.
data_parallel_degreeβ
data_parallel_degree: int
from_params()β
classmethod from_params(*params)
Creates a MultiKVCacheParams from one or more KVCacheParams.
-
Parameters:
-
params (KVCacheParams) β One or more
KVCacheParamsinstances to aggregate. All params must share the samepage_size,data_parallel_degree,n_devices,enable_kvcache_swapping_to_host, andhost_kvcache_swap_space_gbvalues. -
Returns:
-
A new
MultiKVCacheParamsaggregating all provided params. -
Raises:
-
ValueError β If no params are provided.
-
Return type:
get_symbolic_inputs()β
get_symbolic_inputs(prefix='')
Returns the symbolic inputs for the KV cache.
-
Parameters:
-
prefix (str)
-
Return type:
host_kvcache_swap_space_gbβ
kv_connectorβ
kv_connector: KVConnectorType | None
n_devicesβ
n_devices: int
num_draft_tokensβ
num_draft_tokens: int = 0
page_sizeβ
page_size: int
paramsβ
params: Sequence[KVCacheParams]
List of KV cache parameter sets to aggregate.
replicates_kv_across_tpβ
property replicates_kv_across_tp: bool
Whether every device holds identical KV state.
speculative_methodβ
speculative_method: Literal['eagle', 'mtp', 'dflash'] | None = None
tensor_parallel_degreeβ
property tensor_parallel_degree: int
Returns the tensor parallel degree.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!