Skip to main content

Python class

SpeculativeConfig

SpeculativeConfig

class max.pipelines.SpeculativeConfig(*, config_file=None, section_name=None, speculative_method=None, num_speculative_tokens=2, rejection_sampling_strategy=None, synthetic_acceptance_rate=None, use_relaxed_acceptance_for_thinking=False, relaxed_topk=10, relaxed_delta=0.6)

source

Bases: ConfigFileModel

Configures speculative decoding for a pipeline.

Speculative decoding accelerates token generation by having a small draft step propose several candidate tokens that the larger target verifies in one forward pass. This class selects the method (speculative_method), how many tokens to draft per step (num_speculative_tokens), and how the target verifies them (rejection_sampling_strategy).

The CLI surfaces these fields as --speculative-method, --num-speculative-tokens, --rejection-sampling-strategy, and --synthetic-acceptance-rate. Construct the config directly when configuring a pipeline programmatically:

from max.pipelines import SpeculativeConfig

spec = SpeculativeConfig(
    speculative_method="eagle",
    num_speculative_tokens=3,
)

Parameters:

  • config_file (str | None)
  • section_name (str | None)
  • speculative_method (Literal['standalone', 'eagle', 'mtp'] | None)
  • num_speculative_tokens (int)
  • rejection_sampling_strategy (Literal['greedy', 'residual', 'typical-acceptance', 'logit-comparison'] | None)
  • synthetic_acceptance_rate (float | None)
  • use_relaxed_acceptance_for_thinking (bool)
  • relaxed_topk (int)
  • relaxed_delta (float)

is_eagle()

is_eagle()

source

Returns whether the configured method is EAGLE.

EAGLE drafts share the target’s embedding and lm_head layers and read the target’s hidden states.

Return type:

bool

is_mtp()

is_mtp()

source

Returns whether the configured method is multi-token prediction (MTP).

Return type:

bool

is_standalone()

is_standalone()

source

Returns whether the configured method is a standalone draft model.

Return type:

bool

model_config

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'strict': False}

source

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init()

model_post_init(context, /)

source

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

  • self (BaseModel) – The BaseModel instance.
  • context (Any) – The context.

Return type:

None

num_speculative_tokens

num_speculative_tokens: int

source

The number of tokens the draft proposes per verification pass.

Defaults to 2. Larger values can raise the average draft acceptance length and peak speedup, but they may hurt acceptance rates at later positions and increase kernel latencies from the additional tokens.

rejection_sampling_strategy

rejection_sampling_strategy: RejectionSamplingStrategy | None

source

The rejection sampling strategy used to verify drafted tokens.

When None, defaults to "typical-acceptance" for eagle and mtp and "residual" for standalone.

relaxed_delta

relaxed_delta: float

source

relaxed_topk

relaxed_topk: int

source

speculative_method

speculative_method: SpeculativeMethod | None

source

The speculative decoding method to use.

One of "standalone", "eagle", or "mtp". When None, speculative decoding is disabled.

synthetic_acceptance_rate

synthetic_acceptance_rate: float | None

source

A benchmarking-only override that accepts drafts with a calibrated probability, ignoring real logits.

Must be between 0.0 and 1.0. When set, each draft position is accepted with a probability calibrated so that the mean joint acceptance across num_speculative_tokens positions matches this value. Use it to model hypothetical speedups without changing the draft model; leave unset for real serving.

use_relaxed_acceptance_for_thinking

use_relaxed_acceptance_for_thinking: bool

source

uses_greedy_rejection()

uses_greedy_rejection()

source

Returns whether the "greedy" rejection sampling strategy is selected.

Return type:

bool

uses_logit_comparison()

uses_logit_comparison()

source

Returns whether the "logit-comparison" strategy is selected.

Return type:

bool

uses_typical_acceptance()

uses_typical_acceptance()

source

Returns whether the "typical-acceptance" strategy is selected.

Return type:

bool