Skip to main content
Log in

Structured output

MAX supports the generation of structured output using XGrammar as a backend. Structured output, also sometimes referred to as constrained decoding, allows users to enforce specific output formats, ensuring structured and predictable responses from a model.

API compatibility

The /chat/completions and /completions API endpoints are compatible with structured output. To use structured output, use the --enable-structured-output flag when serving your model and include the response_format parameter in your inference request.

max-pipelines serve \
--model-path="modularai/Llama-3.1-8B-Instruct-GGUF" \
--enable-structured-output
max-pipelines serve \
--model-path="modularai/Llama-3.1-8B-Instruct-GGUF" \
--enable-structured-output

JSON schema

To specify a structured output, use the following request format:

curl -N http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model"="modularai/Llama-3.1-8B-Instruct-GGUF",
"messages"=[
{"role": "system", "content": "You are a helpful math tutor.
Guide the user through the solution step by step.
Provide your guidance in JSON format."},
{"role": "user", "content": "How can I solve 8x + 7 = -23"}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "math_response",
"schema": {
"type": "object",
"properties": {
"steps": {
"type": "array",
"items": {
"type": "object",
"properties": {
"explanation": {"type": "string"},
"output": {"type": "string"}
},
"required": ["explanation", "output"],
"additionalProperties": False
}
},
"final_answer": {"type": "string"}
},
"required": ["steps", "final_answer"],
"additionalProperties": False
}
}
}
curl -N http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model"="modularai/Llama-3.1-8B-Instruct-GGUF",
"messages"=[
{"role": "system", "content": "You are a helpful math tutor.
Guide the user through the solution step by step.
Provide your guidance in JSON format."},
{"role": "user", "content": "How can I solve 8x + 7 = -23"}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "math_response",
"schema": {
"type": "object",
"properties": {
"steps": {
"type": "array",
"items": {
"type": "object",
"properties": {
"explanation": {"type": "string"},
"output": {"type": "string"}
},
"required": ["explanation", "output"],
"additionalProperties": False
}
},
"final_answer": {"type": "string"}
},
"required": ["steps", "final_answer"],
"additionalProperties": False
}
}
}

Schema validation

You can also define your structured output using the Pydantic BaseModel to validate your JSON schema in Python.

Here's an example:

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]

completion = client.beta.chat.completions.parse(
model="modularai/Llama-3.1-8B-Instruct-GGUF",
messages=[
{"role": "system", "content": "Extract the event information."},
{"role": "user", "content": "Alice and Bob are going to a movie on Friday."},
],
response_format=CalendarEvent,
)

event = completion.choices[0].message.parsed
from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]

completion = client.beta.chat.completions.parse(
model="modularai/Llama-3.1-8B-Instruct-GGUF",
messages=[
{"role": "system", "content": "Extract the event information."},
{"role": "user", "content": "Alice and Bob are going to a movie on Friday."},
],
response_format=CalendarEvent,
)

event = completion.choices[0].message.parsed

Supported models

All text generation models support structured output with MAX. As new models are added, they will also be compatible with structured output. This functionality is implemented at the pipeline level, ensuring consistency across different models.

However, structured output currently doesn't support PyTorch models or CPU deployments—only MAX models deployed on GPUs.

Function calling

If you want to structure a model's output when it responds to a user, then you should use a structured output response_format.

If you are connecting a model to tools, functions, data, or other systems, then you should use function calling instead of structured outputs.