Python class
SpecDecodingState
SpecDecodingStateโ
class max.pipelines.core.SpecDecodingState(draft_tokens_to_verify=<factory>, maybe_accepted_draft_tokens=<factory>)
Bases: object
Per-request state for speculative decoding.
draft_tokens_to_verifyโ
The draft tokens to verify in the next batch
maybe_accepted_draft_tokensโ
The draft tokens that are being verified in the current batch
We are unsure whether these tokens will be accepted or not. However, to ensure that we allocate enough KV, we conservatively assume that they will all be accepted.
This should only be present when running with overlap scheduler.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!