Skip to main content

Python class

KVCacheMetrics

KVCacheMetrics

class max.nn.kv_cache.KVCacheMetrics(input_tokens=0, cache_tokens=0, h2d_blocks_copied=0, d2h_blocks_copied=0, disk_blocks_written=0, disk_blocks_read=0)

source

Bases: object

Metrics for the KV cache.

Tracks token usage and block transfer statistics for KV cache operations.

Parameters:

  • input_tokens (int)
  • cache_tokens (int)
  • h2d_blocks_copied (int)
  • d2h_blocks_copied (int)
  • disk_blocks_written (int)
  • disk_blocks_read (int)

cache_hit_rate

property cache_hit_rate: float

source

Proportion of prompt tokens that were retrieved from cache.

Returns:

Ratio of cache_tokens to total prompt_tokens, or 0.0 if no tokens were processed.

cache_tokens

cache_tokens: int = 0

source

Number of tokens retrieved from cache (cache hits).

d2h_blocks_copied

d2h_blocks_copied: int = 0

source

Number of cache blocks copied from device to host.

disk_blocks_read

disk_blocks_read: int = 0

source

Number of cache blocks read from disk.

disk_blocks_written

disk_blocks_written: int = 0

source

Number of cache blocks written to disk.

h2d_blocks_copied

h2d_blocks_copied: int = 0

source

Number of cache blocks copied from host to device.

input_tokens

input_tokens: int = 0

source

Number of tokens processed as new input (cache misses).

prompt_tokens

property prompt_tokens: int

source

Total number of prompt tokens (input + cached).

Returns:

Sum of input_tokens and cache_tokens.