IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

MultiKVCacheParams

MultiKVCacheParams​

class max.nn.kv_cache.MultiKVCacheParams(params, page_size, data_parallel_degree, n_devices, kv_connector, host_kvcache_swap_space_gb, speculative_method=None, num_draft_tokens=0)

source

Bases: KVCacheParamInterface

Aggregates multiple KV cache parameter sets.

This class implements KVCacheParamInterface by aggregating multiple KVCacheParamInterface instances. Useful for models with multiple distinct KV caches (e.g., different cache configurations for different layers).

Parameters:

bytes_per_block​

property bytes_per_block: int

source

Total bytes per block across all KV caches.

Since all caches allocate memory for the same sequence, the total memory cost per block is the sum across all param sets.

data_parallel_degree​

data_parallel_degree: int

source

from_params()​

classmethod from_params(*params)

source

Creates a MultiKVCacheParams from one or more KVCacheParams.

Parameters:

params (KVCacheParams) – One or more KVCacheParams instances to aggregate. All params must share the same page_size, data_parallel_degree, n_devices, enable_kvcache_swapping_to_host, and host_kvcache_swap_space_gb values.

Returns:

A new MultiKVCacheParams aggregating all provided params.

Raises:

ValueError – If no params are provided.

Return type:

MultiKVCacheParams

get_symbolic_inputs()​

get_symbolic_inputs(prefix='')

source

Returns the symbolic inputs for the KV cache.

Parameters:

prefix (str)

Return type:

KVCacheInputs[TensorType, BufferType]

host_kvcache_swap_space_gb​

host_kvcache_swap_space_gb: float | None

source

kv_connector​

kv_connector: KVConnectorType | None

source

n_devices​

n_devices: int

source

num_draft_tokens​

num_draft_tokens: int = 0

source

page_size​

page_size: int

source

params​

params: Sequence[KVCacheParams]

source

List of KV cache parameter sets to aggregate.

replicates_kv_across_tp​

property replicates_kv_across_tp: bool

source

Whether every device holds identical KV state.

speculative_method​

speculative_method: Literal['eagle', 'mtp', 'dflash'] | None = None

source

tensor_parallel_degree​

property tensor_parallel_degree: int

source

Returns the tensor parallel degree.