Skip to main content

Python class

SpecDecodingState

SpecDecodingStateโ€‹

class max.pipelines.core.SpecDecodingState(draft_tokens_to_verify=<factory>, maybe_accepted_draft_tokens=<factory>)

source

Bases: object

Per-request state for speculative decoding.

Parameters:

  • draft_tokens_to_verify (list[int])
  • maybe_accepted_draft_tokens (list[int])

draft_tokens_to_verifyโ€‹

draft_tokens_to_verify: list[int]

source

The draft tokens to verify in the next batch

maybe_accepted_draft_tokensโ€‹

maybe_accepted_draft_tokens: list[int]

source

The draft tokens that are being verified in the current batch

We are unsure whether these tokens will be accepted or not. However, to ensure that we allocate enough KV, we conservatively assume that they will all be accepted.

This should only be present when running with overlap scheduler.