IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

ReasoningParser

ReasoningParser

class max.pipelines.modeling.types.ReasoningParser

source

Bases: ABC

Parser for identifying reasoning spans in model output.

from_tokenizer()

abstract async classmethod from_tokenizer(tokenizer)

source

Constructs a reasoning parser from a tokenizer.

Parameters:

tokenizer (PipelineTokenizer[Any, Any, Any]) – The PipelineTokenizer to use for resolving reasoning delimiter token IDs.

Returns:

A new ReasoningParser instance.

Return type:

ReasoningParser

reasoning_end_token_id()

abstract async classmethod reasoning_end_token_id(tokenizer)

source

Returns the single-token ID that closes a reasoning span.

Used by callers that need to detect end-of-reasoning without instantiating the full parser (e.g., grammar-region setup in the tokenizer). Implementations should resolve their architecture’s end-marker string (</think>, <channel|>, etc.) via max.pipelines.lib.tokenizer.convert_token_to_id().

Parameters:

tokenizer (PipelineTokenizer[Any, Any, Any]) – The PipelineTokenizer used for token-id resolution.

Returns:

The token ID that marks end-of-reasoning, or None if the architecture’s end marker doesn’t tokenize to a single ID.

Return type:

int | None

reset()

reset()

source

Resets per-request state.

Called at the start of each request to clear any internal state accumulated during a prior request.

Return type:

None

stream()

abstract stream(delta_token_ids, is_currently_reasoning=True)

source

Identifies a reasoning span within a streaming delta chunk.

Parameters:

  • delta_token_ids (Sequence[int]) – The token IDs of the incremental streaming chunk.
  • is_currently_reasoning (bool) – Whether the stream was already inside a reasoning span at the start of this chunk. When True (the default, for backward compatibility), the parser treats the chunk as continuing reasoning unless/until it finds an end delimiter. When False, the parser only enters reasoning if it actually finds a start delimiter in this chunk — letting callers feed every chunk through and catch mid-stream reasoning sections (e.g. Gemma 4 emitting <|channel>thought\n...<channel|> even when reasoning wasn’t pre-seeded).

Returns:

A ParsedReasoningDelta containing the reasoning span, whether reasoning is still active, and an optional formatter for decoded reasoning text.

Return type:

ParsedReasoningDelta

will_reason_after_prompt()

will_reason_after_prompt(prompt_token_ids)

source

Predicts whether the model will emit reasoning after this prompt.

Called once at turn initiation to seed the streaming reasoning state machine and decide whether grammar enforcement should be suspended for the first generated tokens.

The default implementation delegates to stream(), which scans left-to-right and returns is_still_reasoning. Architectures should override this when they have a more reliable signal (e.g., a dedicated think-enable token).

Parameters:

prompt_token_ids (Sequence[int]) – The full prompt token id sequence.

Returns:

True if the model will start with reasoning tokens; False otherwise.

Return type:

bool