Skip to main content

Python class

SamplingParams

SamplingParams

class max.interfaces.SamplingParams(top_k=-1, top_p=1, min_p=0.0, temperature=1, frequency_penalty=0.0, presence_penalty=0.0, repetition_penalty=1.0, max_new_tokens=None, min_new_tokens=0, ignore_eos=False, stop=None, stop_token_ids=None, detokenize=True, seed=<factory>, logits_processors=None)

source

Bases: object

Request specific sampling parameters that are only known at run time.

Parameters:

detokenize

detokenize: bool = True

source

Whether to detokenize the output tokens into text.

frequency_penalty

frequency_penalty: float = 0.0

source

The frequency penalty to apply to the model’s output. A positive value will penalize new tokens based on their frequency in the generated text: tokens will receive a penalty proportional to the count of appearances.

from_input_and_generation_config()

classmethod from_input_and_generation_config(input_params, sampling_params_defaults)

source

Creates a SamplingParams instance with defaults from a HuggingFace GenerationConfig.

Combines three sources of values in priority order (highest to lowest):

  1. User-provided values in input_params (non-None)
  2. Model’s GenerationConfig values (only if explicitly set in the model’s config)
  3. SamplingParams class defaults

Parameters:

Returns:

A new SamplingParams instance with model-aware defaults.

Return type:

SamplingParams

params = SamplingParams.from_input_and_generation_config(
    SamplingParamsInput(temperature=0.7),
    sampling_params_defaults=model_config.sampling_params_defaults,
)

ignore_eos

ignore_eos: bool = False

source

If True, the response will ignore the EOS token, and continue to generate until the max tokens or a stop string is hit.

log_sampling_info()

log_sampling_info()

source

Logs comprehensive sampling parameters information.

Displays all sampling parameters in a consistent visual format similar to pipeline configuration logging.

Return type:

None

logits_processors

logits_processors: Sequence[Callable[[ProcessorInputs], None]] | None = None

source

Callables to post-process the model logits. See LogitsProcessor for examples.

max_new_tokens

max_new_tokens: int | None = None

source

The maximum number of new tokens to generate in the response.

When set to an integer value, generation will stop after this many tokens. When None (default), the model may generate tokens until it reaches its internal limits or other stopping criteria are met.

min_new_tokens

min_new_tokens: int = 0

source

The minimum number of tokens to generate in the response.

min_p

min_p: float = 0.0

source

Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.

needs_penalties

property needs_penalties: bool

source

Whether penalties are needed for the set of sampling parameters.

presence_penalty

presence_penalty: float = 0.0

source

The presence penalty to apply to the model’s output. A positive value will penalize new tokens that have already appeared in the generated text at least once by applying a constant penalty.

repetition_penalty

repetition_penalty: float = 1.0

source

The repetition penalty to apply to the model’s output. Values > 1 will penalize new tokens that have already appeared in the generated text at least once by dividing the logits by the repetition penalty.

seed

seed: int

source

The seed to use for the random number generator. Defaults to a cryptographically secure random value.

stop

stop: list[str] | None = None

source

A list of detokenized sequences that can be used as stop criteria when generating a new sequence.

stop_token_ids

stop_token_ids: list[int] | None = None

source

A list of token ids that are used as stopping criteria when generating a new sequence.

temperature

temperature: float = 1

source

Controls the randomness of the model’s output; higher values produce more diverse responses. For greedy sampling, set to temperature to 0.

top_k

top_k: int = -1

source

Limits the sampling to the K most probable tokens. This defaults to -1 (to sample all tokens), for greedy sampling set to 1.

top_p

top_p: float = 1

source

Only use the tokens whose cumulative probability is within the top_p threshold. This applies to the top_k tokens.