Python class

TextGenerationRequest

`TextGenerationRequest`

class max.interfaces.TextGenerationRequest(request_id, model_name, prompt=None, messages=<factory>, images=<factory>, videos=<factory>, tools=None, response_format=None, timestamp_ns=0, request_path='/', logprobs=0, echo=False, stop=None, chat_template_options=None, sampling_params=<factory>, target_endpoint=None, dkv_cache_hint=None)

source

Bases: object

An immutable request for text token generation from a pipeline.

Parameters:

request_id (RequestID)
model_name (str)
prompt (str | Sequence[int] | None)
messages (list[TextGenerationRequestMessage])
images (list[bytes])
videos (list[bytes])
tools (list[TextGenerationRequestTool] | None)
response_format (TextGenerationResponseFormat | None)
timestamp_ns (int)
request_path (str)
logprobs (int)
echo (bool)
stop (str | list[str] | None)
chat_template_options (dict[str, Any] | None)
sampling_params (SamplingParams)
target_endpoint (str | None)
dkv_cache_hint (dict[str, Any] | None)

`chat_template_options`

chat_template_options: dict[str, Any] | None = None

source

Optional dictionary of options to pass when applying the chat template.

`dkv_cache_hint`

dkv_cache_hint: dict[str, Any] | None = None

source

Cache hint from the Orchestrator for distributed KV cache.

When present, the serving layer converts this into TextContext.external_block_metadata so the DKVConnector can fetch cached blocks before the forward pass.

`echo`

echo: bool = False

source

If set to True, the response will include the original prompt along with the generated output. This can be useful for debugging or when you want to see how the input relates to the output.

`images`

images: list[bytes]

source

A list of image byte arrays that can be included as part of the request. This field is optional and may be used for multimodal inputs where images are relevant to the prompt or task.

`logprobs`

logprobs: int = 0

source

The number of top log probabilities to return for each generated token. A value of 0 means that log probabilities will not be returned. Useful for analyzing model confidence in its predictions.

`messages`

messages: list[TextGenerationRequestMessage]

source

A list of messages for chat-based interactions. This is used in chat completion APIs, where each message represents a turn in the conversation. If provided, the model will generate responses based on these messages.

`model_name`

model_name: str

source

The name of the model to be used for generating tokens. This should match the available models on the server and determines the behavior and capabilities of the response generation.

`number_of_images`

property number_of_images: int

source

Returns the total number of image-type contents across all provided messages.

Returns:: Total count of image-type contents found in messages.

`number_of_videos`

property number_of_videos: int

source

Returns the total number of video-type contents across all provided messages.

Returns:: Total count of video-type contents found in messages.

`prompt`

prompt: str | Sequence[int] | None = None

source

The prompt to be processed by the model. This field supports legacy completion APIs and can accept either a string or a sequence of integers representing token IDs. If not provided, the model may generate output based on the messages field.

`request_id`

request_id: RequestID

source

A unique identifier for the request.

`request_path`

request_path: str = '/'

source

The endpoint path for the request. This is typically used for routing and logging requests within the server infrastructure.

`response_format`

response_format: TextGenerationResponseFormat | None = None

source

Specifies the desired format for the model’s output. When set, it enables structured generation, which adheres to the json_schema provided.

`sampling_params`

sampling_params: SamplingParams

source

Token sampling configuration parameters for the request.

`stop`

stop: str | list[str] | None = None

source

//platform.openai.com/docs/api-reference/chat/create#chat-create-stop)

Type:: Optional list of stop expressions (see
Type:: https

`target_endpoint`

target_endpoint: str | None = None

source

Optional target endpoint identifier for routing the request to a specific service or model instance. This should be used in disaggregate serving scenarios, when you want to dynamically route to a specific instance. If not specified, the request will be routed to the default endpoint.

`timestamp_ns`

timestamp_ns: int = 0

source

The time (in nanoseconds) when the request was received by the server. This can be useful for performance monitoring and logging purposes.

`tools`

tools: list[TextGenerationRequestTool] | None = None

source

A list of tools that can be invoked during the generation process. This allows the model to utilize external functionalities or APIs to enhance its responses.

`videos`

videos: list[bytes]

source

A list of video byte arrays that can be included as part of the request. Each video is decoded into frames during preprocessing.

TextGenerationRequest​

chat_template_options​

dkv_cache_hint​

echo​

images​

logprobs​

messages​

model_name​

number_of_images​

number_of_videos​

prompt​

request_id​

request_path​

response_format​

sampling_params​

stop​

target_endpoint​

timestamp_ns​

tools​

videos​