IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

KVCacheParamInterface

KVCacheParamInterface​

class max.nn.kv_cache.KVCacheParamInterface(*args, **kwargs)

source

Bases: Protocol

Interface for KV cache parameters.

bytes_per_block​

property bytes_per_block: int

source

Number of bytes per cache block.

data_parallel_degree​

data_parallel_degree: int

source

get_symbolic_inputs()​

get_symbolic_inputs(prefix='')

source

Returns the symbolic inputs for the KV cache.

Parameters:

prefix (str)

Return type:

KVCacheInputs[TensorType, BufferType]

host_kvcache_swap_space_gb​

host_kvcache_swap_space_gb: float | None

source

kv_connector​

kv_connector: KVConnectorType | None

source

n_devices​

n_devices: int

source

num_draft_tokens​

num_draft_tokens: int = 0

source

num_draft_tokens_per_step​

property num_draft_tokens_per_step: int

source

Number of draft tokens written per draft forward.

One for autoregressive drafts (eagle, mtp); equal to num_draft_tokens for block drafts (dflash).

page_size​

page_size: int

source

replicates_kv_across_tp​

property replicates_kv_across_tp: bool

source

Whether every device holds identical KV state.

speculative_method​

speculative_method: Literal['eagle', 'mtp', 'dflash'] | None = None

source

tensor_parallel_degree​

property tensor_parallel_degree: int

source

Returns the tensor parallel degree.