Python class
ReasoningParser
ReasoningParser
class max.pipelines.modeling.types.ReasoningParser
Bases: ABC
Parser for identifying reasoning spans in model output.
from_tokenizer()
abstract async classmethod from_tokenizer(tokenizer)
Constructs a reasoning parser from a tokenizer.
-
Parameters:
-
tokenizer (PipelineTokenizer[Any, Any, Any]) – The
PipelineTokenizerto use for resolving reasoning delimiter token IDs. -
Returns:
-
A new
ReasoningParserinstance. -
Return type:
is_prompt_in_reasoning()
is_prompt_in_reasoning(prompt_token_ids)
Decide whether the next generated token continues a reasoning span.
Called once at turn initiation, given the full prompt token ids (including any chat-template prefill). The result is used to seed the streaming reasoning state machine before the model emits its first token.
Multi-turn prompts can legitimately contain </think> tokens
from prior assistant turns. The default implementation delegates
to stream(), which scans left-to-right and would treat any
such stale </think> as “reasoning has ended” — incorrect for
the new assistant turn. Architectures whose chat templates emit
reasoning delimiters per turn should override this to consider
only the most recent delimiter (e.g., a right-to-left scan).
reasoning_end_token_id()
abstract async classmethod reasoning_end_token_id(tokenizer)
Returns the single-token ID that closes a reasoning span.
Used by callers that need to detect end-of-reasoning without
instantiating the full parser (e.g., grammar-region setup in the
tokenizer). Implementations should resolve their architecture’s
end-marker string (</think>, <channel|>, etc.) via
max.pipelines.lib.tokenizer.convert_token_to_id().
-
Parameters:
-
tokenizer (PipelineTokenizer[Any, Any, Any]) – The
PipelineTokenizerused for token-id resolution. -
Returns:
-
The token ID that marks end-of-reasoning, or
Noneif the architecture’s end marker doesn’t tokenize to a single ID. -
Return type:
-
int | None
reset()
reset()
Resets per-request state.
Called at the start of each request to clear any internal state accumulated during a prior request.
-
Return type:
-
None
stream()
abstract stream(delta_token_ids, is_currently_reasoning=True)
Identifies a reasoning span within a streaming delta chunk.
-
Parameters:
-
- delta_token_ids (Sequence[int]) – The token IDs of the incremental streaming chunk.
- is_currently_reasoning (bool) – Whether the stream was already inside a
reasoning span at the start of this chunk. When
True(the default, for backward compatibility), the parser treats the chunk as continuing reasoning unless/until it finds an end delimiter. WhenFalse, the parser only enters reasoning if it actually finds a start delimiter in this chunk — letting callers feed every chunk through and catch mid-stream reasoning sections (e.g. Gemma 4 emitting<|channel>thought\n...<channel|>even when reasoning wasn’t pre-seeded).
-
Returns:
-
A
ParsedReasoningDeltacontaining the reasoning span, whether reasoning is still active, and an optional formatter for decoded reasoning text. -
Return type:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!