For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python function
rejection_sampler_with_residuals
rejection_sampler_with_residuals()
max.pipelines.sampling.rejection_sampler_with_residuals(device, *, debug=False)
Builds a rejection sampler with residual sampling for speculative decoding.
Computes acceptance ratios for draft tokens, finds first rejection, samples from residual distribution (target - draft), and generates bonus tokens.
The sampling RNG seed is bound as a graph input — callers refresh it per execution so RNG varies across calls.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!