For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

GrammarEnforcementState

`GrammarEnforcementState`

class max.pipelines.context.GrammarEnforcementState(grammar_enforced=False, tools_forced=False, requires_structured_output_flag=False, has_json_schema=False, tool_region=None, thinking_region_delimiters=None, _in_thinking_region=False, _tool_calling_match_buffer=<factory>, _thinking_match_buffer=<factory>)

source

Bases: object

Manages grammar enforcement state for constrained decoding.

Encapsulates the logic for tracking whether grammar is currently being enforced, detecting tool call and thinking region boundary token sequences, and managing the token buffer for multi-token sequence matching.

The key transitions are: detecting </think> exits the thinking region, and detecting tool-call start/end tokens toggles enforcement on/off for tool_choice=auto. For tool_choice=required or a JSON schema, enforcement is active from the first generated token (after any thinking region is exited).

Parameters:

grammar_enforced (bool)
tools_forced (bool)
requires_structured_output_flag (bool)
has_json_schema (bool)
tool_region (StructuredOutputRegionDelimiters | None)
thinking_region_delimiters (StructuredOutputRegionDelimiters | None)
_in_thinking_region (bool)
_tool_calling_match_buffer (list[int])
_thinking_match_buffer (list[int])

`from_response_format()`

classmethod from_response_format(response_format)

source

Creates a state from the given response format, or a default state.

Parameters:: response_format (TextGenerationResponseFormat | None)
Return type:: GrammarEnforcementState

`grammar_enforced`

grammar_enforced: bool = False

source

Whether grammar is currently being enforced via bitmask.

For tool_choice=required or response_format: True from start. For tool_choice=auto without response_format: False initially, flipped to True when tool call start token is detected.

`has_json_schema`

has_json_schema: bool = False

source

Whether this request includes a JSON schema response format.

`requires_structured_output_flag`

requires_structured_output_flag: bool = False

source

Whether this request requires –enable-structured-output to be set.

True when the constraint includes a user-supplied JSON schema. False for pure tool-call grammars derived from the model’s tool parser.

`restore()`

restore(snapshot)

source

Restore state captured by snapshot().

Parameters:: snapshot (GrammarEnforcementSnapshot)
Return type:: None

`snapshot()`

snapshot()

source

Capture state needed to roll back a speculative advance.

The speculative bitmask path walks the enforcement state through draft tokens to compute downstream slot constraints, then unwinds so that committed-token processing on the next batch replays the same transitions from a clean state. The returned snapshot is opaque to callers; pass it to restore.

Return type:: GrammarEnforcementSnapshot

`thinking_region_delimiters`

thinking_region_delimiters: StructuredOutputRegionDelimiters | None = None

source

Token sequences defining thinking boundaries (e.g., </think>).

When set, grammar enforcement is suspended inside thinking regions. The key insight is that when thinking is enabled, the chat template already emits <think> in the prompt, so we start in thinking region and only need to detect </think> to exit.

`tool_region`

tool_region: StructuredOutputRegionDelimiters | None = None

source

Token sequences defining tool call boundaries, if conditional enforcement.

`tools_forced`

tools_forced: bool = False

source

Whether tool calling was forced (tool_choice=required or named).

Controls whether grammar_enforced is True from the first generated token. Independent of the –enable-structured-output server flag (which only gates user-supplied schemas; see requires_structured_output_flag).

`update_enforcement_state()`

update_enforcement_state(token)

source

Update enforcement state based on sampled token.

Checks if the token completes a start/end sequence and toggles grammar_enforced accordingly. Thinking region transitions take priority over tool region transitions.

Parameters:: token (int) – The newly sampled token.
Returns:: True if the matcher should consume the token.
Return type:: bool

GrammarEnforcementState​

from_response_format()​

grammar_enforced​

has_json_schema​

requires_structured_output_flag​

restore()​

snapshot()​

thinking_region_delimiters​

tool_region​

tools_forced​

update_enforcement_state()​