Python class
TextGenerationRequest
TextGenerationRequest
class max.interfaces.TextGenerationRequest(request_id, model_name, prompt=None, messages=<factory>, images=<factory>, videos=<factory>, tools=None, response_format=None, timestamp_ns=0, request_path='/', logprobs=0, echo=False, stop=None, chat_template_options=None, sampling_params=<factory>, target_endpoint=None, dkv_cache_hint=None)
Bases: object
An immutable request for text token generation from a pipeline.
-
Parameters:
-
- request_id (RequestID)
- model_name (str)
- prompt (str | Sequence[int] | None)
- messages (list[TextGenerationRequestMessage])
- images (list[bytes])
- videos (list[bytes])
- tools (list[TextGenerationRequestTool] | None)
- response_format (TextGenerationResponseFormat | None)
- timestamp_ns (int)
- request_path (str)
- logprobs (int)
- echo (bool)
- stop (str | list[str] | None)
- chat_template_options (dict[str, Any] | None)
- sampling_params (SamplingParams)
- target_endpoint (str | None)
- dkv_cache_hint (dict[str, Any] | None)
chat_template_options
Optional dictionary of options to pass when applying the chat template.
dkv_cache_hint
Cache hint from the Orchestrator for distributed KV cache.
When present, the serving layer converts this into
TextContext.external_block_metadata so the DKVConnector can
fetch cached blocks before the forward pass.
echo
echo: bool = False
If set to True, the response will include the original prompt along with
the generated output. This can be useful for debugging or when you want to
see how the input relates to the output.
images
A list of image byte arrays that can be included as part of the request. This field is optional and may be used for multimodal inputs where images are relevant to the prompt or task.
logprobs
logprobs: int = 0
The number of top log probabilities to return for each generated token. A value of 0 means that log probabilities will not be returned. Useful for analyzing model confidence in its predictions.
messages
messages: list[TextGenerationRequestMessage]
A list of messages for chat-based interactions. This is used in chat completion APIs, where each message represents a turn in the conversation. If provided, the model will generate responses based on these messages.
model_name
model_name: str
The name of the model to be used for generating tokens. This should match the available models on the server and determines the behavior and capabilities of the response generation.
number_of_images
property number_of_images: int
Returns the total number of image-type contents across all provided messages.
-
Returns:
-
Total count of image-type contents found in messages.
number_of_videos
property number_of_videos: int
Returns the total number of video-type contents across all provided messages.
-
Returns:
-
Total count of video-type contents found in messages.
prompt
The prompt to be processed by the model. This field supports legacy completion APIs and can accept either a string or a sequence of integers representing token IDs. If not provided, the model may generate output based on the messages field.
request_id
request_id: RequestID
A unique identifier for the request.
request_path
request_path: str = '/'
The endpoint path for the request. This is typically used for routing and logging requests within the server infrastructure.
response_format
response_format: TextGenerationResponseFormat | None = None
Specifies the desired format for the model’s output. When set, it enables structured generation, which adheres to the json_schema provided.
sampling_params
sampling_params: SamplingParams
Token sampling configuration parameters for the request.
stop
//platform.openai.com/docs/api-reference/chat/create#chat-create-stop)
-
Type:
-
Optional list of stop expressions (see
-
Type:
-
https
target_endpoint
Optional target endpoint identifier for routing the request to a specific service or model instance. This should be used in disaggregate serving scenarios, when you want to dynamically route to a specific instance. If not specified, the request will be routed to the default endpoint.
timestamp_ns
timestamp_ns: int = 0
The time (in nanoseconds) when the request was received by the server. This can be useful for performance monitoring and logging purposes.
tools
tools: list[TextGenerationRequestTool] | None = None
A list of tools that can be invoked during the generation process. This allows the model to utilize external functionalities or APIs to enhance its responses.
videos
A list of video byte arrays that can be included as part of the request. Each video is decoded into frames during preprocessing.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!