For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

MultiKVCacheParams

`MultiKVCacheParams`

class max.nn.kv_cache.MultiKVCacheParams(params, page_size, data_parallel_degree, n_devices, kv_connector, host_kvcache_swap_space_gb, speculative_method=None, num_draft_tokens=0)

source

Bases: KVCacheParamInterface

Aggregates multiple KV cache parameter sets.

This class implements KVCacheParamInterface by aggregating multiple KVCacheParamInterface instances. Useful for models with multiple distinct KV caches (e.g., different cache configurations for different layers).

Parameters:

params (Sequence[KVCacheParams])
page_size (int)
data_parallel_degree (int)
n_devices (int)
kv_connector (KVConnectorType | None)
host_kvcache_swap_space_gb (float | None)
speculative_method (Literal['eagle', 'mtp', 'dflash'] | None)
num_draft_tokens (int)

`bytes_per_block`

property bytes_per_block: int

source

Total bytes per block across all KV caches.

Since all caches allocate memory for the same sequence, the total memory cost per block is the sum across all param sets.

`data_parallel_degree`

data_parallel_degree: int

source

`from_params()`

classmethod from_params(*params)

source

Creates a MultiKVCacheParams from one or more KVCacheParams.

Parameters:: params (KVCacheParams) – One or more KVCacheParams instances to aggregate. All params must share the same page_size, data_parallel_degree, n_devices, enable_kvcache_swapping_to_host, and host_kvcache_swap_space_gb values.
Returns:: A new MultiKVCacheParams aggregating all provided params.
Raises:: ValueError – If no params are provided.
Return type:: MultiKVCacheParams

`get_symbolic_inputs()`

get_symbolic_inputs(prefix='')

source

Returns the symbolic inputs for the KV cache.

Parameters:: prefix (str)
Return type:: KVCacheInputs[TensorType, BufferType]

`host_kvcache_swap_space_gb`

host_kvcache_swap_space_gb: float | None

source

`kv_connector`

kv_connector: KVConnectorType | None

source

`n_devices`

n_devices: int

source

`num_draft_tokens`

num_draft_tokens: int = 0

source

`page_size`

page_size: int

source

`params`

params: Sequence[KVCacheParams]

source

List of KV cache parameter sets to aggregate.

`replicates_kv_across_tp`

property replicates_kv_across_tp: bool

source

Whether every device holds identical KV state.

`speculative_method`

speculative_method: Literal['eagle', 'mtp', 'dflash'] | None = None

source

`tensor_parallel_degree`

property tensor_parallel_degree: int

source

Returns the tensor parallel degree.

MultiKVCacheParams​

bytes_per_block​

data_parallel_degree​

from_params()​

get_symbolic_inputs()​

host_kvcache_swap_space_gb​

kv_connector​

n_devices​

num_draft_tokens​

page_size​

params​

replicates_kv_across_tp​

speculative_method​

tensor_parallel_degree​