For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
GrammarEnforcementState
GrammarEnforcementState
class max.pipelines.context.GrammarEnforcementState(grammar_enforced=False, tools_forced=False, requires_structured_output_flag=False, has_json_schema=False, tool_region=None, thinking_region_delimiters=None, _in_thinking_region=False, _tool_calling_match_buffer=<factory>, _thinking_match_buffer=<factory>)
Bases: object
Manages grammar enforcement state for constrained decoding.
Encapsulates the logic for tracking whether grammar is currently being enforced, detecting tool call and thinking region boundary token sequences, and managing the token buffer for multi-token sequence matching.
The key transitions are: detecting </think> exits the thinking region,
and detecting tool-call start/end tokens toggles enforcement on/off for
tool_choice=auto. For tool_choice=required or a JSON schema,
enforcement is active from the first generated token (after any thinking
region is exited).
-
Parameters:
-
- grammar_enforced (bool)
- tools_forced (bool)
- requires_structured_output_flag (bool)
- has_json_schema (bool)
- tool_region (StructuredOutputRegionDelimiters | None)
- thinking_region_delimiters (StructuredOutputRegionDelimiters | None)
- _in_thinking_region (bool)
- _tool_calling_match_buffer (list[int])
- _thinking_match_buffer (list[int])
from_response_format()
classmethod from_response_format(response_format)
Creates a state from the given response format, or a default state.
-
Parameters:
-
response_format (TextGenerationResponseFormat | None)
-
Return type:
grammar_enforced
grammar_enforced: bool = False
Whether grammar is currently being enforced via bitmask.
For tool_choice=required or response_format: True from start. For tool_choice=auto without response_format: False initially, flipped to True when tool call start token is detected.
has_json_schema
has_json_schema: bool = False
Whether this request includes a JSON schema response format.
requires_structured_output_flag
requires_structured_output_flag: bool = False
Whether this request requires –enable-structured-output to be set.
True when the constraint includes a user-supplied JSON schema. False for pure tool-call grammars derived from the model’s tool parser.
restore()
restore(snapshot)
Restore state captured by snapshot().
-
Parameters:
-
snapshot (GrammarEnforcementSnapshot)
-
Return type:
-
None
snapshot()
snapshot()
Capture state needed to roll back a speculative advance.
The speculative bitmask path walks the enforcement state through draft tokens to compute downstream slot constraints, then unwinds so that committed-token processing on the next batch replays the same transitions from a clean state. The returned snapshot is opaque to callers; pass it to restore.
-
Return type:
thinking_region_delimiters
thinking_region_delimiters: StructuredOutputRegionDelimiters | None = None
Token sequences defining thinking boundaries (e.g., </think>).
When set, grammar enforcement is suspended inside thinking regions.
The key insight is that when thinking is enabled, the chat template
already emits <think> in the prompt, so we start in thinking region
and only need to detect </think> to exit.
tool_region
tool_region: StructuredOutputRegionDelimiters | None = None
Token sequences defining tool call boundaries, if conditional enforcement.
tools_forced
tools_forced: bool = False
Whether tool calling was forced (tool_choice=required or named).
Controls whether grammar_enforced is True from the first generated token.
Independent of the –enable-structured-output server flag (which only gates
user-supplied schemas; see requires_structured_output_flag).
update_enforcement_state()
update_enforcement_state(token)
Update enforcement state based on sampled token.
Checks if the token completes a start/end sequence and toggles grammar_enforced accordingly. Thinking region transitions take priority over tool region transitions.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!