Structured output
MAX supports the generation of structured output using XGrammar as a backend. Structured output, also sometimes referred to as constrained decoding, allows users to enforce specific output formats, ensuring structured and predictable responses from a model.
API compatibility
The /chat/completions
and /completions
API endpoints are compatible with
structured output. To use structured output, use the
--enable-structured-output
flag when serving your model and include the
response_format
parameter in your inference request.
max-pipelines serve \
--model-path="modularai/Llama-3.1-8B-Instruct-GGUF" \
--enable-structured-output
max-pipelines serve \
--model-path="modularai/Llama-3.1-8B-Instruct-GGUF" \
--enable-structured-output
JSON schema
To specify a structured output, use the following request format:
curl -N http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model"="modularai/Llama-3.1-8B-Instruct-GGUF",
"messages"=[
{"role": "system", "content": "You are a helpful math tutor.
Guide the user through the solution step by step.
Provide your guidance in JSON format."},
{"role": "user", "content": "How can I solve 8x + 7 = -23"}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "math_response",
"schema": {
"type": "object",
"properties": {
"steps": {
"type": "array",
"items": {
"type": "object",
"properties": {
"explanation": {"type": "string"},
"output": {"type": "string"}
},
"required": ["explanation", "output"],
"additionalProperties": False
}
},
"final_answer": {"type": "string"}
},
"required": ["steps", "final_answer"],
"additionalProperties": False
}
}
}
curl -N http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model"="modularai/Llama-3.1-8B-Instruct-GGUF",
"messages"=[
{"role": "system", "content": "You are a helpful math tutor.
Guide the user through the solution step by step.
Provide your guidance in JSON format."},
{"role": "user", "content": "How can I solve 8x + 7 = -23"}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "math_response",
"schema": {
"type": "object",
"properties": {
"steps": {
"type": "array",
"items": {
"type": "object",
"properties": {
"explanation": {"type": "string"},
"output": {"type": "string"}
},
"required": ["explanation", "output"],
"additionalProperties": False
}
},
"final_answer": {"type": "string"}
},
"required": ["steps", "final_answer"],
"additionalProperties": False
}
}
}
Schema validation
You can also define your structured output using the Pydantic
BaseModel
to validate your
JSON schema in Python.
Here's an example:
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
completion = client.beta.chat.completions.parse(
model="modularai/Llama-3.1-8B-Instruct-GGUF",
messages=[
{"role": "system", "content": "Extract the event information."},
{"role": "user", "content": "Alice and Bob are going to a movie on Friday."},
],
response_format=CalendarEvent,
)
event = completion.choices[0].message.parsed
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
completion = client.beta.chat.completions.parse(
model="modularai/Llama-3.1-8B-Instruct-GGUF",
messages=[
{"role": "system", "content": "Extract the event information."},
{"role": "user", "content": "Alice and Bob are going to a movie on Friday."},
],
response_format=CalendarEvent,
)
event = completion.choices[0].message.parsed
Supported models
All text generation models support structured output with MAX. As new models are added, they will also be compatible with structured output. This functionality is implemented at the pipeline level, ensuring consistency across different models.
However, structured output currently doesn't support PyTorch models or CPU deployments—only MAX models deployed on GPUs.
Function calling
If you want to structure a model's output when it responds to a user, then you
should use a structured output response_format
.
If you are connecting a model to tools, functions, data, or other systems, then you should use function calling instead of structured outputs.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!