Skip to main content

Python class

ReasoningParser

ReasoningParser

class max.pipelines.modeling.types.ReasoningParser

source

Bases: ABC

Parser for identifying reasoning spans in model output.

from_tokenizer()

abstract async classmethod from_tokenizer(tokenizer)

source

Constructs a reasoning parser from a tokenizer.

Parameters:

tokenizer (PipelineTokenizer[Any, Any, Any]) – The PipelineTokenizer to use for resolving reasoning delimiter token IDs.

Returns:

A new ReasoningParser instance.

Return type:

ReasoningParser

is_prompt_in_reasoning()

is_prompt_in_reasoning(prompt_token_ids)

source

Decide whether the next generated token continues a reasoning span.

Called once at turn initiation, given the full prompt token ids (including any chat-template prefill). The result is used to seed the streaming reasoning state machine before the model emits its first token.

Multi-turn prompts can legitimately contain </think> tokens from prior assistant turns. The default implementation delegates to stream(), which scans left-to-right and would treat any such stale </think> as “reasoning has ended” — incorrect for the new assistant turn. Architectures whose chat templates emit reasoning delimiters per turn should override this to consider only the most recent delimiter (e.g., a right-to-left scan).

Parameters:

prompt_token_ids (Sequence[int]) – The full prompt token id sequence.

Returns:

True if the next generated token should be treated as part of a reasoning span; False otherwise.

Return type:

bool

reasoning_end_token_id()

abstract async classmethod reasoning_end_token_id(tokenizer)

source

Returns the single-token ID that closes a reasoning span.

Used by callers that need to detect end-of-reasoning without instantiating the full parser (e.g., grammar-region setup in the tokenizer). Implementations should resolve their architecture’s end-marker string (</think>, <channel|>, etc.) via max.pipelines.lib.tokenizer.convert_token_to_id().

Parameters:

tokenizer (PipelineTokenizer[Any, Any, Any]) – The PipelineTokenizer used for token-id resolution.

Returns:

The token ID that marks end-of-reasoning, or None if the architecture’s end marker doesn’t tokenize to a single ID.

Return type:

int | None

reset()

reset()

source

Resets per-request state.

Called at the start of each request to clear any internal state accumulated during a prior request.

Return type:

None

stream()

abstract stream(delta_token_ids, is_currently_reasoning=True)

source

Identifies a reasoning span within a streaming delta chunk.

Parameters:

  • delta_token_ids (Sequence[int]) – The token IDs of the incremental streaming chunk.
  • is_currently_reasoning (bool) – Whether the stream was already inside a reasoning span at the start of this chunk. When True (the default, for backward compatibility), the parser treats the chunk as continuing reasoning unless/until it finds an end delimiter. When False, the parser only enters reasoning if it actually finds a start delimiter in this chunk — letting callers feed every chunk through and catch mid-stream reasoning sections (e.g. Gemma 4 emitting <|channel>thought\n...<channel|> even when reasoning wasn’t pre-seeded).

Returns:

A ParsedReasoningDelta containing the reasoning span, whether reasoning is still active, and an optional formatter for decoded reasoning text.

Return type:

ParsedReasoningDelta