Skip to main content

Python class

KVTransferEngineMetadata

KVTransferEngineMetadata

class max.kv_cache.KVTransferEngineMetadata(*, name, total_num_pages, bytes_per_page, memory_type, hostname, agents_meta, replicate_kv_across_tp=False)

source

Bases: Struct

Metadata associated with a transfer engine.

This is safe to send between threads/processes.

Parameters:

  • name (str)
  • total_num_pages (int)
  • bytes_per_page (int)
  • memory_type (MemoryType)
  • hostname (str)
  • agents_meta (list[list[TensorAgentMetadata]])
  • replicate_kv_across_tp (bool)

agents_meta

agents_meta: list[list[TensorAgentMetadata]]

source

[replica][tp_shard].

Type:

Metadata for each replica’s agents

bytes_per_page

bytes_per_page: int

source

Bytes per page for each tensor.

hostname

hostname: str

source

Hostname of the machine that the transfer engine is running on.

memory_type

memory_type: MemoryType

source

Memory type of the transfer engine.

name

name: str

source

Base name of the transfer engine.

replicate_kv_across_tp

replicate_kv_across_tp: bool

source

True iff KV buffers are identical across TP ranks (e.g. MLA with num_kv_heads=1). When both sides declare different (dp, tp) but one replicates, the engine can reinterpret the replicating side as [dp*tp][1] to let a prefill worker at (DP=m, TP=n) connect to a decode worker at (DP=m*n, TP=1).

total_num_pages

total_num_pages: int

source

Total number of pages in each tensor.