Python class
SpeculativeConfig
SpeculativeConfig
class max.pipelines.SpeculativeConfig(*, config_file=None, section_name=None, speculative_method=None, num_speculative_tokens=2, rejection_sampling_strategy=None, synthetic_acceptance_rate=None, use_relaxed_acceptance_for_thinking=False, relaxed_topk=10, relaxed_delta=0.6)
Bases: ConfigFileModel
Configures speculative decoding for a pipeline.
Speculative decoding accelerates token generation by having a small
draft step propose several candidate tokens that the larger target
verifies in one forward pass. This class selects the method
(speculative_method), how many tokens to draft per step
(num_speculative_tokens), and how the target verifies them
(rejection_sampling_strategy).
The CLI surfaces these fields as --speculative-method,
--num-speculative-tokens, --rejection-sampling-strategy, and
--synthetic-acceptance-rate. Construct the config directly when
configuring a pipeline programmatically:
from max.pipelines import SpeculativeConfig
spec = SpeculativeConfig(
speculative_method="eagle",
num_speculative_tokens=3,
)-
Parameters:
-
- config_file (str | None)
- section_name (str | None)
- speculative_method (Literal['standalone', 'eagle', 'mtp'] | None)
- num_speculative_tokens (int)
- rejection_sampling_strategy (Literal['greedy', 'residual', 'typical-acceptance', 'logit-comparison'] | None)
- synthetic_acceptance_rate (float | None)
- use_relaxed_acceptance_for_thinking (bool)
- relaxed_topk (int)
- relaxed_delta (float)
is_eagle()
is_eagle()
Returns whether the configured method is EAGLE.
EAGLE drafts share the target’s embedding and lm_head layers
and read the target’s hidden states.
-
Return type:
is_mtp()
is_mtp()
Returns whether the configured method is multi-token prediction (MTP).
-
Return type:
is_standalone()
is_standalone()
Returns whether the configured method is a standalone draft model.
-
Return type:
model_config
model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'strict': False}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_post_init()
model_post_init(context, /)
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
-
Parameters:
-
- self (BaseModel) – The BaseModel instance.
- context (Any) – The context.
-
Return type:
-
None
num_speculative_tokens
num_speculative_tokens: int
The number of tokens the draft proposes per verification pass.
Defaults to 2. Larger values can raise the average draft
acceptance length and peak speedup, but they may hurt acceptance
rates at later positions and increase kernel latencies from the
additional tokens.
rejection_sampling_strategy
rejection_sampling_strategy: RejectionSamplingStrategy | None
The rejection sampling strategy used to verify drafted tokens.
When None, defaults to "typical-acceptance" for eagle and
mtp and "residual" for standalone.
relaxed_delta
relaxed_delta: float
relaxed_topk
relaxed_topk: int
speculative_method
speculative_method: SpeculativeMethod | None
The speculative decoding method to use.
One of "standalone", "eagle", or "mtp". When None,
speculative decoding is disabled.
synthetic_acceptance_rate
A benchmarking-only override that accepts drafts with a calibrated probability, ignoring real logits.
Must be between 0.0 and 1.0. When set, each draft position is
accepted with a probability calibrated so that the mean joint
acceptance across num_speculative_tokens positions matches this
value. Use it to model hypothetical speedups without changing the draft
model; leave unset for real serving.
use_relaxed_acceptance_for_thinking
use_relaxed_acceptance_for_thinking: bool
uses_greedy_rejection()
uses_greedy_rejection()
Returns whether the "greedy" rejection sampling strategy is selected.
-
Return type:
uses_logit_comparison()
uses_logit_comparison()
Returns whether the "logit-comparison" strategy is selected.
-
Return type:
uses_typical_acceptance()
uses_typical_acceptance()
Returns whether the "typical-acceptance" strategy is selected.
-
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!