Python module

interfaces

Top level imports for pipeline interfaces.

`PipelineTask`

class max.pipelines.interfaces.PipelineTask(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

`EMBEDDINGS_GENERATION`

EMBEDDINGS_GENERATION = 'embeddings_generation'

`TEXT_GENERATION`

TEXT_GENERATION = 'text_generation'

`PipelineTokenizer`

class max.pipelines.interfaces.PipelineTokenizer(*args, **kwargs)

Interface for LLM tokenizers.

`decode()`

async decode(context: TokenGeneratorContext, encoded: TokenizerEncoded, **kwargs) → str

Decodes response tokens to text.

Parameters:
- context (TokenGeneratorContext) – Current generation context.
- encoded (TokenizerEncoded) – Encoded response tokens.
Returns:

Un-encoded response text.
Return type:

str

`encode()`

async encode(prompt: str) → TokenizerEncoded

Encodes text prompts as tokens.

Parameters:

prompt (str) – Un-encoded prompt text.
Raises:

ValueError – If the prompt exceeds the configured maximum length.
Returns:

Encoded prompt tokens.
Return type:

TokenizerEncoded

`eos`

property eos*: int*

The end of sequence token for this tokenizer.

`expects_content_wrapping`

property expects_content_wrapping*: bool*

If true, this tokenizer expects messages to have a ‘content’ property. Text messages are formatted as { “type” : “text”, “content” : “text content”} instead of, the OpenAI spec. { “type” : “text”, “text”: “text content” }. NOTE: Multimodal messages omit the content property. Both “image_urls” and “image” content parts are converted to simply { “type” : “image” } Their content is provided as byte arrays and by the top level property on the request object, i.e. “TokenGeneratorRequest.images”.

`new_context()`

async new_context(request: TokenGeneratorRequest) → TokenGeneratorContext

Creates a new context from a request object. This is sent to the worker process once and then cached locally.

Parameters:

request (TokenGeneratorRequest) – Incoming request.
Returns:

Initialized context.
Return type:

TokenGeneratorContext

`TokenGenerator`

class max.pipelines.interfaces.TokenGenerator(*args, **kwargs)

Interface for LLM token-generator models.

`next_token()`

next_token(batch: dict[str, TokenGeneratorContext], num_steps: int = 1) → list[dict[str, Any]]

Computes the next token response for a single batch.

Parameters:
- batch (dict[str, TokenGeneratorContext]) – Batch of contexts.
- num_steps (int, optional) – Number of forward steps. Defaults to 1.
Returns:

List of encoded responses (indexed by req. ID)
Return type:

list[dict[str, Any]]

`release()`

release(context: TokenGeneratorContext) → None

Releases resources associated with this context.

Parameters:

context (TokenGeneratorContext) – Finished context.

`TokenGeneratorRequest`

class max.pipelines.interfaces.TokenGeneratorRequest(id: str, index: int, model_name: str, prompt: str | Sequence[int] | NoneType = None, messages: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestMessage] | None = None, images: list[bytes] | None = None, tools: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestTool] | None = None, response_format: max.pipelines.interfaces.text_generation.TokenGeneratorResponseFormat | None = None, max_new_tokens: int | None = None, timestamp_ns: int = 0, request_path: str = '/', logprobs: int = 0, echo: bool = False)

`echo`

echo*: bool* = False

If set to True, the response will include the original prompt along with the generated output. This can be useful for debugging or when you want to see how the input relates to the output.

`id`

id*: str*

A unique identifier for the request. This ID can be used to trace and log the request throughout its lifecycle, facilitating debugging and tracking.

`images`

images*: list[bytes] | None* = None

A list of image byte arrays that can be included as part of the request. This field is optional and may be used for multimodal inputs where images are relevant to the prompt or task.

`index`

index*: int*

The sequence order of this request within a batch. This is useful for maintaining the order of requests when processing multiple requests simultaneously, ensuring that responses can be matched back to their corresponding requests accurately.

`logprobs`

logprobs*: int* = 0

The number of top log probabilities to return for each generated token. A value of 0 means that log probabilities will not be returned. Useful for analyzing model confidence in its predictions.

`max_new_tokens`

max_new_tokens*: int | None* = None

The maximum number of new tokens to generate in the response. If not set, the model may generate tokens until it reaches its internal limits or based on other stopping criteria.

`messages`

messages*: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestMessage] | None* = None

A list of messages for chat-based interactions. This is used in chat completion APIs, where each message represents a turn in the conversation. If provided, the model will generate responses based on these messages.

`model_name`

model_name*: str*

The name of the model to be used for generating tokens. This should match the available models on the server and determines the behavior and capabilities of the response generation.

`prompt`

prompt*: str | Sequence[int] | None* = None

The prompt to be processed by the model. This field supports legacy completion APIs and can accept either a string or a sequence of integers representing token IDs. If not provided, the model may generate output based on the messages field.

`request_path`

request_path*: str* = '/'

The endpoint path for the request. This is typically used for routing and logging requests within the server infrastructure.

`response_format`

response_format*: TokenGeneratorResponseFormat | None* = None

Specifies the desired format for the model’s output. When set, it enables structured generation, which adheres to the json_schema provided.

`timestamp_ns`

timestamp_ns*: int* = 0

The time (in nanoseconds) when the request was received by the server. This can be useful for performance monitoring and logging purposes.

`tools`

tools*: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestTool] | None* = None

A list of tools that can be invoked during the generation process. This allows the model to utilize external functionalities or APIs to enhance its responses.

`TokenGeneratorRequestFunction`

class max.pipelines.interfaces.TokenGeneratorRequestFunction

`description`

description*: str*

`name`

name*: str*

`parameters`

parameters*: dict*

`TokenGeneratorRequestMessage`

class max.pipelines.interfaces.TokenGeneratorRequestMessage

`content`

content*: str | list[dict[str, Any]]*

Content can be simple string or a list of message parts of different modalities.

For example:

{
  "role": "user",
  "content": "What'''s the weather like in Boston today?"
}
{
  "role": "user",
  "content": "What'''s the weather like in Boston today?"
}

Or:

{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "What'''s in this image?"
    },
    {
      "type": "image_url",
      "image_url": {
          "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
      }
    }
  ]
}
{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "What'''s in this image?"
    },
    {
      "type": "image_url",
      "image_url": {
          "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
      }
    }
  ]
}

`role`

role*: Literal['system', 'user', 'assistant']*

`TokenGeneratorRequestTool`

class max.pipelines.interfaces.TokenGeneratorRequestTool

`function`

function*: TokenGeneratorRequestFunction*

`type`

type*: str*

PipelineTask​

EMBEDDINGS_GENERATION​

TEXT_GENERATION​

PipelineTokenizer​

decode()​

encode()​

eos​

expects_content_wrapping​

new_context()​

TokenGenerator​

next_token()​

release()​

TokenGeneratorRequest​

echo​

id​

images​

index​

logprobs​

max_new_tokens​

messages​

model_name​

prompt​

request_path​

response_format​

timestamp_ns​

tools​

TokenGeneratorRequestFunction​

description​

name​

parameters​

TokenGeneratorRequestMessage​

content​

role​

TokenGeneratorRequestTool​

function​

type​