Python module
interfaces
Top level imports for pipeline interfaces.
PipelineTask
class max.pipelines.interfaces.PipelineTask(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
EMBEDDINGS_GENERATION
EMBEDDINGS_GENERATION = 'embeddings_generation'
TEXT_GENERATION
TEXT_GENERATION = 'text_generation'
PipelineTokenizer
class max.pipelines.interfaces.PipelineTokenizer(*args, **kwargs)
Interface for LLM tokenizers.
decode()
async decode(context: TokenGeneratorContext, encoded: TokenizerEncoded, **kwargs) → str
Decodes response tokens to text.
-
Parameters:
- context (TokenGeneratorContext) – Current generation context.
- encoded (TokenizerEncoded) – Encoded response tokens.
-
Returns:
Un-encoded response text.
-
Return type:
encode()
async encode(prompt: str) → TokenizerEncoded
Encodes text prompts as tokens.
-
Parameters:
prompt (str) – Un-encoded prompt text.
-
Raises:
ValueError – If the prompt exceeds the configured maximum length.
-
Returns:
Encoded prompt tokens.
-
Return type:
TokenizerEncoded
eos
property eos*: int*
The end of sequence token for this tokenizer.
expects_content_wrapping
property expects_content_wrapping*: bool*
If true, this tokenizer expects messages to have a ‘content’ property. Text messages are formatted as { “type” : “text”, “content” : “text content”} instead of, the OpenAI spec. { “type” : “text”, “text”: “text content” }. NOTE: Multimodal messages omit the content property. Both “image_urls” and “image” content parts are converted to simply { “type” : “image” } Their content is provided as byte arrays and by the top level property on the request object, i.e. “TokenGeneratorRequest.images”.
new_context()
async new_context(request: TokenGeneratorRequest) → TokenGeneratorContext
Creates a new context from a request object. This is sent to the worker process once and then cached locally.
-
Parameters:
request (TokenGeneratorRequest) – Incoming request.
-
Returns:
Initialized context.
-
Return type:
TokenGeneratorContext
TokenGenerator
class max.pipelines.interfaces.TokenGenerator(*args, **kwargs)
Interface for LLM token-generator models.
next_token()
next_token(batch: dict[str, TokenGeneratorContext], num_steps: int = 1) → list[dict[str, Any]]
Computes the next token response for a single batch.
release()
release(context: TokenGeneratorContext) → None
Releases resources associated with this context.
-
Parameters:
context (TokenGeneratorContext) – Finished context.
TokenGeneratorRequest
class max.pipelines.interfaces.TokenGeneratorRequest(id: str, index: int, model_name: str, prompt: str | Sequence[int] | NoneType = None, messages: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestMessage] | None = None, images: list[bytes] | None = None, tools: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestTool] | None = None, response_format: max.pipelines.interfaces.text_generation.TokenGeneratorResponseFormat | None = None, max_new_tokens: int | None = None, timestamp_ns: int = 0, request_path: str = '/', logprobs: int = 0, echo: bool = False)
echo
echo*: bool* = False
If set to True, the response will include the original prompt along with the generated output. This can be useful for debugging or when you want to see how the input relates to the output.
id
id*: str*
A unique identifier for the request. This ID can be used to trace and log the request throughout its lifecycle, facilitating debugging and tracking.
images
A list of image byte arrays that can be included as part of the request. This field is optional and may be used for multimodal inputs where images are relevant to the prompt or task.
index
index*: int*
The sequence order of this request within a batch. This is useful for maintaining the order of requests when processing multiple requests simultaneously, ensuring that responses can be matched back to their corresponding requests accurately.
logprobs
logprobs*: int* = 0
The number of top log probabilities to return for each generated token. A value of 0 means that log probabilities will not be returned. Useful for analyzing model confidence in its predictions.
max_new_tokens
The maximum number of new tokens to generate in the response. If not set, the model may generate tokens until it reaches its internal limits or based on other stopping criteria.
messages
messages*: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestMessage] | None* = None
A list of messages for chat-based interactions. This is used in chat completion APIs, where each message represents a turn in the conversation. If provided, the model will generate responses based on these messages.
model_name
model_name*: str*
The name of the model to be used for generating tokens. This should match the available models on the server and determines the behavior and capabilities of the response generation.
prompt
The prompt to be processed by the model. This field supports legacy completion APIs and can accept either a string or a sequence of integers representing token IDs. If not provided, the model may generate output based on the messages field.
request_path
request_path*: str* = '/'
The endpoint path for the request. This is typically used for routing and logging requests within the server infrastructure.
response_format
response_format*: TokenGeneratorResponseFormat | None* = None
Specifies the desired format for the model’s output. When set, it enables structured generation, which adheres to the json_schema provided.
timestamp_ns
timestamp_ns*: int* = 0
The time (in nanoseconds) when the request was received by the server. This can be useful for performance monitoring and logging purposes.
tools
tools*: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestTool] | None* = None
A list of tools that can be invoked during the generation process. This allows the model to utilize external functionalities or APIs to enhance its responses.
TokenGeneratorRequestFunction
class max.pipelines.interfaces.TokenGeneratorRequestFunction
description
description*: str*
name
name*: str*
parameters
parameters*: dict*
TokenGeneratorRequestMessage
class max.pipelines.interfaces.TokenGeneratorRequestMessage
content
Content can be simple string or a list of message parts of different modalities.
For example:
{
"role": "user",
"content": "What'''s the weather like in Boston today?"
}
{
"role": "user",
"content": "What'''s the weather like in Boston today?"
}
Or:
{
"role": "user",
"content": [
{
"type": "text",
"text": "What'''s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
{
"role": "user",
"content": [
{
"type": "text",
"text": "What'''s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
role
role*: Literal['system', 'user', 'assistant']*
TokenGeneratorRequestTool
class max.pipelines.interfaces.TokenGeneratorRequestTool
function
function*: TokenGeneratorRequestFunction*
type
type*: str*
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!