Python module
interfaces
Interfaces for different pipeline behaviors.
PipelineTokenizer
class max.pipelines.interfaces.PipelineTokenizer(*args, **kwargs)
Interface for LLM tokenizers.
decode()
async decode(context: TokenGeneratorContext, encoded: TokenizerEncoded) → str
Decodes response tokens to text.
-
Parameters:
- context (TokenGeneratorContext) – Current generation context.
- encoded (TokenizerEncoded) – Encoded response tokens.
-
Returns:
Un-encoded response text.
-
Return type:
encode()
async encode(prompt: str) → TokenizerEncoded
Encodes text prompts as tokens.
-
Parameters:
prompt (str) – Un-encoded prompt text.
-
Raises:
ValueError – If the prompt exceeds the configured maximum length.
-
Returns:
Encoded prompt tokens.
-
Return type:
TokenizerEncoded
eos
property eos*: int*
new_context()
async new_context(request: TokenGeneratorRequest) → TokenGeneratorContext
Creates a new context from a request object. This is sent to the worker process once and then cached locally.
-
Parameters:
request (TokenGeneratorRequest) – Incoming request.
-
Returns:
Initialized context.
-
Return type:
TokenGeneratorContext
TokenGenerator
class max.pipelines.interfaces.TokenGenerator(*args, **kwargs)
Interface for LLM token-generator models.
next_token()
next_token(batch: dict[str, TokenGeneratorContext], num_steps: int = 1) → list[dict[str, Any]]
Computes the next token response for a single batch.
release()
release(context: TokenGeneratorContext) → None
Releases resources associated with this context.
-
Parameters:
context (TokenGeneratorContext) – Finished context.
TokenGeneratorRequest
class max.pipelines.interfaces.TokenGeneratorRequest(id: str, index: int, model_name: str, prompt: str | None = None, messages: list[max.pipelines.interfaces.TokenGeneratorRequestMessage] | None = None, images: list[bytes] | None = None, max_new_tokens: int | None = None, req_recv_time_ns: int = 0, request_path: str = '/', logprobs: int = 0, echo: bool = False)
echo
echo*: bool* = False
id
id*: str*
images
index
index*: int*
logprobs
logprobs*: int* = 0
max_new_tokens
messages
messages*: list[max.pipelines.interfaces.TokenGeneratorRequestMessage] | None* = None
Chat completion APIs work off messages.
model_name
model_name*: str*
prompt
Prompt here is to support legacy /completion APIs
req_recv_time_ns
req_recv_time_ns*: int* = 0
request_path
request_path*: str* = '/'
TokenGeneratorRequestMessage
class max.pipelines.interfaces.TokenGeneratorRequestMessage
content
Content can be simple string or a list of message parts of different modalities.
For example:
{
"role": "user",
"content": "What'''s the weather like in Boston today?"
}
{
"role": "user",
"content": "What'''s the weather like in Boston today?"
}
Or:
{
"role": "user",
"content": [
{
"type": "text",
"text": "What'''s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
{
"role": "user",
"content": [
{
"type": "text",
"text": "What'''s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
role
role*: Literal['system', 'user', 'assistant', 'tool', 'function']*
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!