Python class
KVCacheMetrics
KVCacheMetrics
class max.nn.kv_cache.KVCacheMetrics(input_tokens=0, cache_tokens=0, h2d_blocks_copied=0, d2h_blocks_copied=0, disk_blocks_written=0, disk_blocks_read=0)
Bases: object
Metrics for the KV cache.
Tracks token usage and block transfer statistics for KV cache operations.
-
Parameters:
cache_hit_rate
property cache_hit_rate: float
Proportion of prompt tokens that were retrieved from cache.
-
Returns:
-
Ratio of cache_tokens to total prompt_tokens, or 0.0 if no tokens were processed.
cache_tokens
cache_tokens: int = 0
Number of tokens retrieved from cache (cache hits).
d2h_blocks_copied
d2h_blocks_copied: int = 0
Number of cache blocks copied from device to host.
disk_blocks_read
disk_blocks_read: int = 0
Number of cache blocks read from disk.
disk_blocks_written
disk_blocks_written: int = 0
Number of cache blocks written to disk.
h2d_blocks_copied
h2d_blocks_copied: int = 0
Number of cache blocks copied from host to device.
input_tokens
input_tokens: int = 0
Number of tokens processed as new input (cache misses).
prompt_tokens
property prompt_tokens: int
Total number of prompt tokens (input + cached).
-
Returns:
-
Sum of input_tokens and cache_tokens.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!