Python class

SamplingParams

`SamplingParams`

class max.interfaces.SamplingParams(top_k=-1, top_p=1, min_p=0.0, temperature=1, frequency_penalty=0.0, presence_penalty=0.0, repetition_penalty=1.0, max_new_tokens=None, min_new_tokens=0, ignore_eos=False, stop=None, stop_token_ids=None, detokenize=True, seed=<factory>, logits_processors=None)

source

Bases: object

Request specific sampling parameters that are only known at run time.

Parameters:

top_k (int)
top_p (float)
min_p (float)
temperature (float)
frequency_penalty (float)
presence_penalty (float)
repetition_penalty (float)
max_new_tokens (int | None)
min_new_tokens (int)
ignore_eos (bool)
stop (list[str] | None)
stop_token_ids (list[int] | None)
detokenize (bool)
seed (int)
logits_processors (Sequence[Callable[[ProcessorInputs], None]] | None)

`detokenize`

detokenize: bool = True

source

Whether to detokenize the output tokens into text.

`frequency_penalty`

frequency_penalty: float = 0.0

source

The frequency penalty to apply to the model’s output. A positive value will penalize new tokens based on their frequency in the generated text: tokens will receive a penalty proportional to the count of appearances.

`from_input_and_generation_config()`

classmethod from_input_and_generation_config(input_params, sampling_params_defaults)

source

Creates a SamplingParams instance with defaults from a HuggingFace GenerationConfig.

Combines three sources of values in priority order (highest to lowest):

User-provided values in input_params (non-None)
Model’s GenerationConfig values (only if explicitly set in the model’s config)
SamplingParams class defaults

Parameters:

input_params (SamplingParamsInput) – Dataclass containing user-specified parameter values. Values of None will be replaced with model defaults or class defaults.
sampling_params_defaults (SamplingParamsGenerationConfigDefaults) – SamplingParamsGenerationConfigDefaults containing default sampling parameters extracted from the model’s GenerationConfig.

Returns:

A new SamplingParams instance with model-aware defaults.

Return type:

SamplingParams

params = SamplingParams.from_input_and_generation_config(
    SamplingParamsInput(temperature=0.7),
    sampling_params_defaults=model_config.sampling_params_defaults,
)

`ignore_eos`

ignore_eos: bool = False

source

If True, the response will ignore the EOS token, and continue to generate until the max tokens or a stop string is hit.

`log_sampling_info()`

log_sampling_info()

source

Logs comprehensive sampling parameters information.

Displays all sampling parameters in a consistent visual format similar to pipeline configuration logging.

Return type:: None

`logits_processors`

logits_processors: Sequence[Callable[[ProcessorInputs], None]] | None = None

source

Callables to post-process the model logits. See LogitsProcessor for examples.

`max_new_tokens`

max_new_tokens: int | None = None

source

The maximum number of new tokens to generate in the response.

When set to an integer value, generation will stop after this many tokens. When None (default), the model may generate tokens until it reaches its internal limits or other stopping criteria are met.

`min_new_tokens`

min_new_tokens: int = 0

source

The minimum number of tokens to generate in the response.

`min_p`

min_p: float = 0.0

source

Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.

`needs_penalties`

property needs_penalties: bool

source

Whether penalties are needed for the set of sampling parameters.

`presence_penalty`

presence_penalty: float = 0.0

source

The presence penalty to apply to the model’s output. A positive value will penalize new tokens that have already appeared in the generated text at least once by applying a constant penalty.

`repetition_penalty`

repetition_penalty: float = 1.0

source

The repetition penalty to apply to the model’s output. Values > 1 will penalize new tokens that have already appeared in the generated text at least once by dividing the logits by the repetition penalty.

`seed`

seed: int

source

The seed to use for the random number generator. Defaults to a cryptographically secure random value.

`stop`

stop: list[str] | None = None

source

A list of detokenized sequences that can be used as stop criteria when generating a new sequence.

`stop_token_ids`

stop_token_ids: list[int] | None = None

source

A list of token ids that are used as stopping criteria when generating a new sequence.

`temperature`

temperature: float = 1

source

Controls the randomness of the model’s output; higher values produce more diverse responses. For greedy sampling, set to temperature to 0.

`top_k`

top_k: int = -1

source

Limits the sampling to the K most probable tokens. This defaults to -1 (to sample all tokens), for greedy sampling set to 1.

`top_p`

top_p: float = 1

source

Only use the tokens whose cumulative probability is within the top_p threshold. This applies to the top_k tokens.

SamplingParams​

detokenize​

frequency_penalty​

from_input_and_generation_config()​

ignore_eos​

log_sampling_info()​

logits_processors​

max_new_tokens​

min_new_tokens​

min_p​

needs_penalties​

presence_penalty​

repetition_penalty​

seed​

stop​

stop_token_ids​

temperature​

top_k​

top_p​