IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

SpecDecodingState

SpecDecodingStateโ€‹

class max.pipelines.context.SpecDecodingState(draft_tokens_to_verify=<factory>, maybe_accepted_draft_tokens=<factory>)

source

Bases: object

Per-request state for speculative decoding.

Parameters:

  • draft_tokens_to_verify (list[int])
  • maybe_accepted_draft_tokens (list[int])

draft_tokens_to_verifyโ€‹

draft_tokens_to_verify: list[int]

source

The draft tokens to verify in the next batch

maybe_accepted_draft_tokensโ€‹

maybe_accepted_draft_tokens: list[int]

source

The draft tokens that are being verified in the current batch

We are unsure whether these tokens will be accepted or not. However, to ensure that we allocate enough KV, we conservatively assume that they will all be accepted.

This should only be present when running with overlap scheduler.