For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

SpecDecodingState

`SpecDecodingState`

class max.pipelines.context.SpecDecodingState(draft_tokens_to_verify=<factory>, maybe_accepted_draft_tokens=<factory>)

source

Bases: object

Per-request state for speculative decoding.

Parameters:

draft_tokens_to_verify (list[int])
maybe_accepted_draft_tokens (list[int])

`draft_tokens_to_verify`

draft_tokens_to_verify: list[int]

source

The draft tokens to verify in the next batch

`maybe_accepted_draft_tokens`

maybe_accepted_draft_tokens: list[int]

source

The draft tokens that are being verified in the current batch

We are unsure whether these tokens will be accepted or not. However, to ensure that we allocate enough KV, we conservatively assume that they will all be accepted.

This should only be present when running with overlap scheduler.

SpecDecodingState​

draft_tokens_to_verify​

maybe_accepted_draft_tokens​

`SpecDecodingState`

`draft_tokens_to_verify`

`maybe_accepted_draft_tokens`