For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
ReasoningParser
ReasoningParser
class max.pipelines.modeling.types.ReasoningParser
Bases: ABC
Parser for identifying reasoning spans in model output.
from_tokenizer()
abstract async classmethod from_tokenizer(tokenizer)
Constructs a reasoning parser from a tokenizer.
-
Parameters:
-
tokenizer (PipelineTokenizer[Any, Any, Any]) – The
PipelineTokenizerto use for resolving reasoning delimiter token IDs. -
Returns:
-
A new
ReasoningParserinstance. -
Return type:
reasoning_end_token_id()
abstract async classmethod reasoning_end_token_id(tokenizer)
Returns the single-token ID that closes a reasoning span.
Used by callers that need to detect end-of-reasoning without
instantiating the full parser (e.g., grammar-region setup in the
tokenizer). Implementations should resolve their architecture’s
end-marker string (</think>, <channel|>, etc.) via
max.pipelines.lib.tokenizer.convert_token_to_id().
-
Parameters:
-
tokenizer (PipelineTokenizer[Any, Any, Any]) – The
PipelineTokenizerused for token-id resolution. -
Returns:
-
The token ID that marks end-of-reasoning, or
Noneif the architecture’s end marker doesn’t tokenize to a single ID. -
Return type:
-
int | None
reset()
reset()
Resets per-request state.
Called at the start of each request to clear any internal state accumulated during a prior request.
-
Return type:
-
None
stream()
abstract stream(delta_token_ids, is_currently_reasoning=True)
Identifies a reasoning span within a streaming delta chunk.
-
Parameters:
-
- delta_token_ids (Sequence[int]) – The token IDs of the incremental streaming chunk.
- is_currently_reasoning (bool) – Whether the stream was already inside a
reasoning span at the start of this chunk. When
True(the default, for backward compatibility), the parser treats the chunk as continuing reasoning unless/until it finds an end delimiter. WhenFalse, the parser only enters reasoning if it actually finds a start delimiter in this chunk — letting callers feed every chunk through and catch mid-stream reasoning sections (e.g. Gemma 4 emitting<|channel>thought\n...<channel|>even when reasoning wasn’t pre-seeded).
-
Returns:
-
A
ParsedReasoningDeltacontaining the reasoning span, whether reasoning is still active, and an optional formatter for decoded reasoning text. -
Return type:
will_reason_after_prompt()
will_reason_after_prompt(prompt_token_ids)
Predicts whether the model will emit reasoning after this prompt.
Called once at turn initiation to seed the streaming reasoning state machine and decide whether grammar enforcement should be suspended for the first generated tokens.
The default implementation delegates to stream(), which
scans left-to-right and returns is_still_reasoning.
Architectures should override this when they have a more
reliable signal (e.g., a dedicated think-enable token).
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!