For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
KVCacheParamInterface
KVCacheParamInterfaceβ
class max.nn.kv_cache.KVCacheParamInterface(*args, **kwargs)
Bases: Protocol
Interface for KV cache parameters.
bytes_per_blockβ
property bytes_per_block: int
Number of bytes per cache block.
data_parallel_degreeβ
data_parallel_degree: int
get_symbolic_inputs()β
get_symbolic_inputs(prefix='')
Returns the symbolic inputs for the KV cache.
-
Parameters:
-
prefix (str)
-
Return type:
host_kvcache_swap_space_gbβ
kv_connectorβ
kv_connector: KVConnectorType | None
n_devicesβ
n_devices: int
num_draft_tokensβ
num_draft_tokens: int = 0
num_draft_tokens_per_stepβ
property num_draft_tokens_per_step: int
Number of draft tokens written per draft forward.
One for autoregressive drafts (eagle, mtp);
equal to num_draft_tokens for block drafts (dflash).
page_sizeβ
page_size: int
replicates_kv_across_tpβ
property replicates_kv_across_tp: bool
Whether every device holds identical KV state.
speculative_methodβ
speculative_method: Literal['eagle', 'mtp', 'dflash'] | None = None
tensor_parallel_degreeβ
property tensor_parallel_degree: int
Returns the tensor parallel degree.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!