For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

KVCacheMetrics

`KVCacheMetrics`

class max.nn.kv_cache.KVCacheMetrics(input_tokens=0, cache_tokens=0, h2d_blocks_copied=0, d2h_blocks_copied=0, disk_blocks_written=0, disk_blocks_read=0, inflight_disk_ops=0, nixl_read_blocks=0, nixl_write_blocks=0, nixl_read_latency_total_ms=0.0, nixl_read_latency_count=0, nixl_write_latency_total_ms=0.0, nixl_write_latency_count=0, rpc_acquire_latency_total_ms=0.0, rpc_acquire_latency_count=0, rpc_read_latency_total_ms=0.0, rpc_read_latency_count=0, nixl_read_bytes=0, nixl_write_bytes=0, nixl_read_blocks_local=0, nixl_read_blocks_remote=0)

source

Bases: object

Metrics for the KV cache.

Tracks token usage and block transfer statistics for KV cache operations.

Parameters:

input_tokens (int)
cache_tokens (int)
h2d_blocks_copied (int)
d2h_blocks_copied (int)
disk_blocks_written (int)
disk_blocks_read (int)
inflight_disk_ops (int)
nixl_read_blocks (int)
nixl_write_blocks (int)
nixl_read_latency_total_ms (float)
nixl_read_latency_count (int)
nixl_write_latency_total_ms (float)
nixl_write_latency_count (int)
rpc_acquire_latency_total_ms (float)
rpc_acquire_latency_count (int)
rpc_read_latency_total_ms (float)
rpc_read_latency_count (int)
nixl_read_bytes (int)
nixl_write_bytes (int)
nixl_read_blocks_local (int)
nixl_read_blocks_remote (int)

`cache_hit_rate`

property cache_hit_rate: float

source

Proportion of prompt tokens that were retrieved from cache.

Returns:: Ratio of cache_tokens to total prompt_tokens, or 0.0 if no tokens were processed.

`cache_tokens`

cache_tokens: int = 0

source

Number of tokens retrieved from cache (cache hits).

`d2h_blocks_copied`

d2h_blocks_copied: int = 0

source

Number of cache blocks copied from device to host.

`disk_blocks_read`

disk_blocks_read: int = 0

source

Number of cache blocks read from disk.

`disk_blocks_written`

disk_blocks_written: int = 0

source

Number of cache blocks written to disk.

`h2d_blocks_copied`

h2d_blocks_copied: int = 0

source

Number of cache blocks copied from host to device.

`inflight_disk_ops`

inflight_disk_ops: int = 0

source

Number of in-flight disk operations.

`input_tokens`

input_tokens: int = 0

source

Number of tokens processed as new input (cache misses).

`nixl_read_blocks`

nixl_read_blocks: int = 0

source

Number of cache blocks read via NIXL (dKV GET).

`nixl_read_blocks_local`

nixl_read_blocks_local: int = 0

source

NIXL reads from co-located (default) block store.

`nixl_read_blocks_remote`

nixl_read_blocks_remote: int = 0

source

NIXL reads from non-default (remote) block stores.

`nixl_read_bytes`

nixl_read_bytes: int = 0

source

Total bytes transferred via NIXL READ.

`nixl_read_gib_per_s`

property nixl_read_gib_per_s: float

source

NIXL READ throughput in GiB/s.

`nixl_read_latency_avg_ms`

property nixl_read_latency_avg_ms: float

source

Average NIXL READ transfer latency in milliseconds.

`nixl_read_latency_count`

nixl_read_latency_count: int = 0

source

Number of NIXL READ transfer completions.

`nixl_read_latency_total_ms`

nixl_read_latency_total_ms: float = 0.0

source

Cumulative NIXL READ transfer latency in milliseconds.

`nixl_write_blocks`

nixl_write_blocks: int = 0

source

Number of cache blocks written via NIXL (dKV PUT).

`nixl_write_bytes`

nixl_write_bytes: int = 0

source

Total bytes transferred via NIXL WRITE.

`nixl_write_gib_per_s`

property nixl_write_gib_per_s: float

source

NIXL WRITE throughput in GiB/s.

`nixl_write_latency_avg_ms`

property nixl_write_latency_avg_ms: float

source

Average NIXL WRITE transfer latency in milliseconds.

`nixl_write_latency_count`

nixl_write_latency_count: int = 0

source

Number of NIXL WRITE transfer completions.

`nixl_write_latency_total_ms`

nixl_write_latency_total_ms: float = 0.0

source

Cumulative NIXL WRITE transfer latency in milliseconds.

`prompt_tokens`

property prompt_tokens: int

source

Total number of prompt tokens (input + cached).

Returns:: Sum of input_tokens and cache_tokens.

`remote_read_ratio`

property remote_read_ratio: float

source

Fraction of NIXL reads hitting non-default (remote) block stores.

`rpc_acquire_latency_avg_ms`

property rpc_acquire_latency_avg_ms: float

source

Average dKV acquire_blocks RPC latency in milliseconds.

`rpc_acquire_latency_count`

rpc_acquire_latency_count: int = 0

source

Number of acquire_blocks RPC calls.

`rpc_acquire_latency_total_ms`

rpc_acquire_latency_total_ms: float = 0.0

source

Cumulative dKV acquire_blocks RPC latency in milliseconds.

`rpc_read_latency_avg_ms`

property rpc_read_latency_avg_ms: float

source

Average dKV read_blocks RPC latency in milliseconds.

`rpc_read_latency_count`

rpc_read_latency_count: int = 0

source

Number of read_blocks RPC calls.

`rpc_read_latency_total_ms`

rpc_read_latency_total_ms: float = 0.0

source

Cumulative dKV read_blocks RPC latency in milliseconds.

KVCacheMetrics​

cache_hit_rate​

cache_tokens​

d2h_blocks_copied​

disk_blocks_read​

disk_blocks_written​

h2d_blocks_copied​

inflight_disk_ops​

input_tokens​

nixl_read_blocks​

nixl_read_blocks_local​

nixl_read_blocks_remote​

nixl_read_bytes​

nixl_read_gib_per_s​

nixl_read_latency_avg_ms​

nixl_read_latency_count​

nixl_read_latency_total_ms​

nixl_write_blocks​

nixl_write_bytes​

nixl_write_gib_per_s​

nixl_write_latency_avg_ms​

nixl_write_latency_count​

nixl_write_latency_total_ms​

prompt_tokens​

remote_read_ratio​

rpc_acquire_latency_avg_ms​

rpc_acquire_latency_count​

rpc_acquire_latency_total_ms​

rpc_read_latency_avg_ms​

rpc_read_latency_count​

rpc_read_latency_total_ms​