IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

KVCacheMetrics

KVCacheMetrics​

class max.nn.kv_cache.KVCacheMetrics(input_tokens=0, cache_tokens=0, h2d_blocks_copied=0, d2h_blocks_copied=0, disk_blocks_written=0, disk_blocks_read=0, inflight_disk_ops=0, nixl_read_blocks=0, nixl_write_blocks=0, nixl_read_latency_total_ms=0.0, nixl_read_latency_count=0, nixl_write_latency_total_ms=0.0, nixl_write_latency_count=0, rpc_acquire_latency_total_ms=0.0, rpc_acquire_latency_count=0, rpc_read_latency_total_ms=0.0, rpc_read_latency_count=0, nixl_read_bytes=0, nixl_write_bytes=0, nixl_read_blocks_local=0, nixl_read_blocks_remote=0)

source

Bases: object

Metrics for the KV cache.

Tracks token usage and block transfer statistics for KV cache operations.

Parameters:

  • input_tokens (int)
  • cache_tokens (int)
  • h2d_blocks_copied (int)
  • d2h_blocks_copied (int)
  • disk_blocks_written (int)
  • disk_blocks_read (int)
  • inflight_disk_ops (int)
  • nixl_read_blocks (int)
  • nixl_write_blocks (int)
  • nixl_read_latency_total_ms (float)
  • nixl_read_latency_count (int)
  • nixl_write_latency_total_ms (float)
  • nixl_write_latency_count (int)
  • rpc_acquire_latency_total_ms (float)
  • rpc_acquire_latency_count (int)
  • rpc_read_latency_total_ms (float)
  • rpc_read_latency_count (int)
  • nixl_read_bytes (int)
  • nixl_write_bytes (int)
  • nixl_read_blocks_local (int)
  • nixl_read_blocks_remote (int)

cache_hit_rate​

property cache_hit_rate: float

source

Proportion of prompt tokens that were retrieved from cache.

Returns:

Ratio of cache_tokens to total prompt_tokens, or 0.0 if no tokens were processed.

cache_tokens​

cache_tokens: int = 0

source

Number of tokens retrieved from cache (cache hits).

d2h_blocks_copied​

d2h_blocks_copied: int = 0

source

Number of cache blocks copied from device to host.

disk_blocks_read​

disk_blocks_read: int = 0

source

Number of cache blocks read from disk.

disk_blocks_written​

disk_blocks_written: int = 0

source

Number of cache blocks written to disk.

h2d_blocks_copied​

h2d_blocks_copied: int = 0

source

Number of cache blocks copied from host to device.

inflight_disk_ops​

inflight_disk_ops: int = 0

source

Number of in-flight disk operations.

input_tokens​

input_tokens: int = 0

source

Number of tokens processed as new input (cache misses).

nixl_read_blocks​

nixl_read_blocks: int = 0

source

Number of cache blocks read via NIXL (dKV GET).

nixl_read_blocks_local​

nixl_read_blocks_local: int = 0

source

NIXL reads from co-located (default) block store.

nixl_read_blocks_remote​

nixl_read_blocks_remote: int = 0

source

NIXL reads from non-default (remote) block stores.

nixl_read_bytes​

nixl_read_bytes: int = 0

source

Total bytes transferred via NIXL READ.

nixl_read_gib_per_s​

property nixl_read_gib_per_s: float

source

NIXL READ throughput in GiB/s.

nixl_read_latency_avg_ms​

property nixl_read_latency_avg_ms: float

source

Average NIXL READ transfer latency in milliseconds.

nixl_read_latency_count​

nixl_read_latency_count: int = 0

source

Number of NIXL READ transfer completions.

nixl_read_latency_total_ms​

nixl_read_latency_total_ms: float = 0.0

source

Cumulative NIXL READ transfer latency in milliseconds.

nixl_write_blocks​

nixl_write_blocks: int = 0

source

Number of cache blocks written via NIXL (dKV PUT).

nixl_write_bytes​

nixl_write_bytes: int = 0

source

Total bytes transferred via NIXL WRITE.

nixl_write_gib_per_s​

property nixl_write_gib_per_s: float

source

NIXL WRITE throughput in GiB/s.

nixl_write_latency_avg_ms​

property nixl_write_latency_avg_ms: float

source

Average NIXL WRITE transfer latency in milliseconds.

nixl_write_latency_count​

nixl_write_latency_count: int = 0

source

Number of NIXL WRITE transfer completions.

nixl_write_latency_total_ms​

nixl_write_latency_total_ms: float = 0.0

source

Cumulative NIXL WRITE transfer latency in milliseconds.

prompt_tokens​

property prompt_tokens: int

source

Total number of prompt tokens (input + cached).

Returns:

Sum of input_tokens and cache_tokens.

remote_read_ratio​

property remote_read_ratio: float

source

Fraction of NIXL reads hitting non-default (remote) block stores.

rpc_acquire_latency_avg_ms​

property rpc_acquire_latency_avg_ms: float

source

Average dKV acquire_blocks RPC latency in milliseconds.

rpc_acquire_latency_count​

rpc_acquire_latency_count: int = 0

source

Number of acquire_blocks RPC calls.

rpc_acquire_latency_total_ms​

rpc_acquire_latency_total_ms: float = 0.0

source

Cumulative dKV acquire_blocks RPC latency in milliseconds.

rpc_read_latency_avg_ms​

property rpc_read_latency_avg_ms: float

source

Average dKV read_blocks RPC latency in milliseconds.

rpc_read_latency_count​

rpc_read_latency_count: int = 0

source

Number of read_blocks RPC calls.

rpc_read_latency_total_ms​

rpc_read_latency_total_ms: float = 0.0

source

Cumulative dKV read_blocks RPC latency in milliseconds.