Skip to main content

Python class

TextGenerationRequest

TextGenerationRequest

class max.interfaces.TextGenerationRequest(request_id, model_name, prompt=None, messages=<factory>, images=<factory>, videos=<factory>, tools=None, response_format=None, timestamp_ns=0, request_path='/', logprobs=0, echo=False, stop=None, chat_template_options=None, sampling_params=<factory>, target_endpoint=None, dkv_cache_hint=None)

source

Bases: object

An immutable request for text token generation from a pipeline.

Parameters:

chat_template_options

chat_template_options: dict[str, Any] | None = None

source

Optional dictionary of options to pass when applying the chat template.

dkv_cache_hint

dkv_cache_hint: dict[str, Any] | None = None

source

Cache hint from the Orchestrator for distributed KV cache.

When present, the serving layer converts this into TextContext.external_block_metadata so the DKVConnector can fetch cached blocks before the forward pass.

echo

echo: bool = False

source

If set to True, the response will include the original prompt along with the generated output. This can be useful for debugging or when you want to see how the input relates to the output.

images

images: list[bytes]

source

A list of image byte arrays that can be included as part of the request. This field is optional and may be used for multimodal inputs where images are relevant to the prompt or task.

logprobs

logprobs: int = 0

source

The number of top log probabilities to return for each generated token. A value of 0 means that log probabilities will not be returned. Useful for analyzing model confidence in its predictions.

messages

messages: list[TextGenerationRequestMessage]

source

A list of messages for chat-based interactions. This is used in chat completion APIs, where each message represents a turn in the conversation. If provided, the model will generate responses based on these messages.

model_name

model_name: str

source

The name of the model to be used for generating tokens. This should match the available models on the server and determines the behavior and capabilities of the response generation.

number_of_images

property number_of_images: int

source

Returns the total number of image-type contents across all provided messages.

Returns:

Total count of image-type contents found in messages.

number_of_videos

property number_of_videos: int

source

Returns the total number of video-type contents across all provided messages.

Returns:

Total count of video-type contents found in messages.

prompt

prompt: str | Sequence[int] | None = None

source

The prompt to be processed by the model. This field supports legacy completion APIs and can accept either a string or a sequence of integers representing token IDs. If not provided, the model may generate output based on the messages field.

request_id

request_id: RequestID

source

A unique identifier for the request.

request_path

request_path: str = '/'

source

The endpoint path for the request. This is typically used for routing and logging requests within the server infrastructure.

response_format

response_format: TextGenerationResponseFormat | None = None

source

Specifies the desired format for the model’s output. When set, it enables structured generation, which adheres to the json_schema provided.

sampling_params

sampling_params: SamplingParams

source

Token sampling configuration parameters for the request.

stop

stop: str | list[str] | None = None

source

//platform.openai.com/docs/api-reference/chat/create#chat-create-stop)

Type:

Optional list of stop expressions (see

Type:

https

target_endpoint

target_endpoint: str | None = None

source

Optional target endpoint identifier for routing the request to a specific service or model instance. This should be used in disaggregate serving scenarios, when you want to dynamically route to a specific instance. If not specified, the request will be routed to the default endpoint.

timestamp_ns

timestamp_ns: int = 0

source

The time (in nanoseconds) when the request was received by the server. This can be useful for performance monitoring and logging purposes.

tools

tools: list[TextGenerationRequestTool] | None = None

source

A list of tools that can be invoked during the generation process. This allows the model to utilize external functionalities or APIs to enhance its responses.

videos

videos: list[bytes]

source

A list of video byte arrays that can be included as part of the request. Each video is decoded into frames during preprocessing.