For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

SyntheticRunner

`SyntheticRunner`

class max.pipelines.sampling.SyntheticRunner(session, device_ref, synthetic_acceptance_rate, num_speculative_tokens)

source

Bases: RejectionRunner

Synthetic acceptance sampler for benchmarking.

Replaces model-driven acceptance with per-position independent Bernoulli draws calibrated so the mean joint acceptance across num_speculative_tokens positions matches synthetic_acceptance_rate. Actual draft/target logits are ignored; real model quality is not measured.

A fresh seed is bound per call so RNG varies across executions; otherwise a single deterministic realization would dominate.

Parameters:

session (InferenceSession)
device_ref (DeviceRef)
synthetic_acceptance_rate (float)
num_speculative_tokens (int)

`run()`

run(draft_tokens, draft_logits, target_logits, target_logit_offsets, all_draft_logits, context_batch)

source

Runs the synthetic acceptance graph with a fresh per-call seed.

draft_logits, target_logit_offsets, all_draft_logits, and context_batch are ignored; synthetic acceptance uses only draft_tokens and target_logits (for the recovered/bonus argmax).

Parameters:

draft_tokens (Buffer)
draft_logits (Buffer | None)
target_logits (Buffer)
target_logit_offsets (Buffer)
all_draft_logits (Buffer | None)
context_batch (list[TextContext])

Return type:

tuple[Buffer, Buffer, Buffer]

SyntheticRunner​

run()​

`SyntheticRunner`

`run()`