For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
BatchCharacteristics
BatchCharacteristics
class max.nn.kv_cache.BatchCharacteristics(batch_size, max_prompt_length, max_cache_valid_length)
Bases: object
Upper-bound batch shape used to prepare decode attention metadata.
Captures the (batch_size, max_prompt_length, max_cache_valid_length) a
decode forward should prepare its attention dispatch metadata for, which
may exceed the batch’s real per-request values.
PagedKVCacheManager.runtime_inputs() uses it to resolve the dispatch
key once: e.g. for graph-capture replay, max_cache_valid_length is
aligned up to a cache length recorded during capture and every data-parallel
replica must run the identical captured graph. The batch’s real values must
not exceed these.
batch_size
batch_size: int
max_cache_valid_length
max_cache_valid_length: int
max_prompt_length
max_prompt_length: int
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!