For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

SpeculativeConfig

`SpeculativeConfig`

class max.pipelines.SpeculativeConfig(*, config_file=None, section_name=None, speculative_method=None, num_speculative_tokens=2, rejection_sampling_strategy=None, synthetic_acceptance_rate=None, use_relaxed_acceptance_for_thinking=False, relaxed_topk=10, relaxed_delta=0.6)

source

Bases: ConfigFileModel

Configures speculative decoding for a pipeline.

Speculative decoding accelerates token generation by having a small draft step propose several candidate tokens that the larger target verifies in one forward pass. This class selects the method (speculative_method), how many tokens to draft per step (num_speculative_tokens), and how the target verifies them (rejection_sampling_strategy).

The CLI surfaces these fields as --speculative-method, --num-speculative-tokens, --rejection-sampling-strategy, and --synthetic-acceptance-rate. Construct the config directly when configuring a pipeline programmatically:

from max.pipelines import SpeculativeConfig

spec = SpeculativeConfig(
    speculative_method="eagle",
    num_speculative_tokens=3,
)

Parameters:

config_file (str | None)
section_name (str | None)
speculative_method (Literal['eagle', 'mtp', 'dflash'] | None)
num_speculative_tokens (int)
rejection_sampling_strategy (Literal['greedy', 'residual', 'typical-acceptance', 'logit-comparison'] | None)
synthetic_acceptance_rate (float | None)
use_relaxed_acceptance_for_thinking (bool)
relaxed_topk (int)
relaxed_delta (float)

`is_dflash()`

is_dflash()

source

Returns whether the configured method is DFlash.

Return type:: bool

`is_eagle()`

is_eagle()

source

Returns whether the configured method is EAGLE.

EAGLE drafts share the target’s embedding and lm_head layers and read the target’s hidden states.

Return type:: bool

`is_mtp()`

is_mtp()

source

Returns whether the configured method is multi-token prediction (MTP).

Return type:: bool

`model_config`

model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'strict': False}

source

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

`model_post_init()`

model_post_init(context, /)

source

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self (BaseModel) – The BaseModel instance.
context (Any) – The context.

Return type:

None

`num_speculative_tokens`

num_speculative_tokens: int

source

The number of tokens the draft proposes per verification pass.

Defaults to 2. Larger values can raise the average draft acceptance length and peak speedup, but they may hurt acceptance rates at later positions and increase kernel latencies from the additional tokens.

`rejection_sampling_strategy`

rejection_sampling_strategy: RejectionSamplingStrategy | None

source

The rejection sampling strategy used to verify drafted tokens.

When None, defaults to "typical-acceptance" for eagle and mtp.

`relaxed_delta`

relaxed_delta: float

source

`relaxed_topk`

relaxed_topk: int

source

`speculative_method`

speculative_method: SpeculativeMethod | None

source

The speculative decoding method to use.

One of "eagle", "mtp", or "dflash". When None, speculative decoding is disabled.

`synthetic_acceptance_rate`

synthetic_acceptance_rate: float | None

source

A benchmarking-only override that accepts drafts with a calibrated probability, ignoring real logits.

Must be between 0.0 and 1.0. When set, each draft position is accepted with a probability calibrated so that the mean joint acceptance across num_speculative_tokens positions matches this value. Use it to model hypothetical speedups without changing the draft model; leave unset for real serving.

`use_relaxed_acceptance_for_thinking`

use_relaxed_acceptance_for_thinking: bool

source

`uses_greedy_rejection()`

uses_greedy_rejection()

source

Returns whether the "greedy" rejection sampling strategy is selected.

Return type:: bool

`uses_logit_comparison()`

uses_logit_comparison()

source

Returns whether the "logit-comparison" strategy is selected.

Return type:: bool

`uses_typical_acceptance()`

uses_typical_acceptance()

source

Returns whether the "typical-acceptance" strategy is selected.

Return type:: bool

SpeculativeConfig​

is_dflash()​

is_eagle()​

is_mtp()​

model_config​

model_post_init()​

num_speculative_tokens​

rejection_sampling_strategy​

relaxed_delta​

relaxed_topk​

speculative_method​

synthetic_acceptance_rate​

use_relaxed_acceptance_for_thinking​

uses_greedy_rejection()​

uses_logit_comparison()​

uses_typical_acceptance()​