Python class
IncrementCacheLengthsProcessor
IncrementCacheLengthsProcessor
class max.pipelines.kv_cache.IncrementCacheLengthsProcessor(session, params)
Bases: object
Processes KV cache length increments after each decoding step.
-
Parameters:
-
- session (InferenceSession)
- params (KVCacheParams)
execute()
execute(kv_cache_inputs, prev_model_inputs)
Runs the increment cache lengths graph and returns updated cache inputs.
-
Parameters:
-
- kv_cache_inputs (KVCacheInputs[Buffer, Buffer])
- prev_model_inputs (Any)
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!