Skip to main content
Log in

Python module

interfaces

Top level imports for pipeline interfaces.

PipelineTask

class max.pipelines.interfaces.PipelineTask(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

EMBEDDINGS_GENERATION

EMBEDDINGS_GENERATION = 'embeddings_generation'

TEXT_GENERATION

TEXT_GENERATION = 'text_generation'

PipelineTokenizer

class max.pipelines.interfaces.PipelineTokenizer(*args, **kwargs)

Interface for LLM tokenizers.

decode()

async decode(context: TokenGeneratorContext, encoded: TokenizerEncoded, **kwargs) → str

Decodes response tokens to text.

  • Parameters:

    • context (TokenGeneratorContext) – Current generation context.
    • encoded (TokenizerEncoded) – Encoded response tokens.
  • Returns:

    Un-encoded response text.

  • Return type:

    str

encode()

async encode(prompt: str) → TokenizerEncoded

Encodes text prompts as tokens.

  • Parameters:

    prompt (str) – Un-encoded prompt text.

  • Raises:

    ValueError – If the prompt exceeds the configured maximum length.

  • Returns:

    Encoded prompt tokens.

  • Return type:

    TokenizerEncoded

eos

property eos*: int*

The end of sequence token for this tokenizer.

expects_content_wrapping

property expects_content_wrapping*: bool*

If true, this tokenizer expects messages to have a ‘content’ property. Text messages are formatted as { “type” : “text”, “content” : “text content”} instead of, the OpenAI spec. { “type” : “text”, “text”: “text content” }. NOTE: Multimodal messages omit the content property. Both “image_urls” and “image” content parts are converted to simply { “type” : “image” } Their content is provided as byte arrays and by the top level property on the request object, i.e. “TokenGeneratorRequest.images”.

new_context()

async new_context(request: TokenGeneratorRequest) → TokenGeneratorContext

Creates a new context from a request object. This is sent to the worker process once and then cached locally.

  • Parameters:

    request (TokenGeneratorRequest) – Incoming request.

  • Returns:

    Initialized context.

  • Return type:

    TokenGeneratorContext

TokenGenerator

class max.pipelines.interfaces.TokenGenerator(*args, **kwargs)

Interface for LLM token-generator models.

next_token()

next_token(batch: dict[str, TokenGeneratorContext], num_steps: int = 1) → list[dict[str, Any]]

Computes the next token response for a single batch.

  • Parameters:

    • batch (dict[str, TokenGeneratorContext]) – Batch of contexts.
    • num_steps (int, optional) – Number of forward steps. Defaults to 1.
  • Returns:

    List of encoded responses (indexed by req. ID)

  • Return type:

    list[dict[str, Any]]

release()

release(context: TokenGeneratorContext) → None

Releases resources associated with this context.

  • Parameters:

    context (TokenGeneratorContext) – Finished context.

TokenGeneratorRequest

class max.pipelines.interfaces.TokenGeneratorRequest(id: str, index: int, model_name: str, prompt: str | Sequence[int] | NoneType = None, messages: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestMessage] | None = None, images: list[bytes] | None = None, tools: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestTool] | None = None, response_format: max.pipelines.interfaces.text_generation.TokenGeneratorResponseFormat | None = None, max_new_tokens: int | None = None, timestamp_ns: int = 0, request_path: str = '/', logprobs: int = 0, echo: bool = False)

echo

echo*: bool* = False

If set to True, the response will include the original prompt along with the generated output. This can be useful for debugging or when you want to see how the input relates to the output.

id

id*: str*

A unique identifier for the request. This ID can be used to trace and log the request throughout its lifecycle, facilitating debugging and tracking.

images

images*: list[bytes] | None* = None

A list of image byte arrays that can be included as part of the request. This field is optional and may be used for multimodal inputs where images are relevant to the prompt or task.

index

index*: int*

The sequence order of this request within a batch. This is useful for maintaining the order of requests when processing multiple requests simultaneously, ensuring that responses can be matched back to their corresponding requests accurately.

logprobs

logprobs*: int* = 0

The number of top log probabilities to return for each generated token. A value of 0 means that log probabilities will not be returned. Useful for analyzing model confidence in its predictions.

max_new_tokens

max_new_tokens*: int | None* = None

The maximum number of new tokens to generate in the response. If not set, the model may generate tokens until it reaches its internal limits or based on other stopping criteria.

messages

messages*: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestMessage] | None* = None

A list of messages for chat-based interactions. This is used in chat completion APIs, where each message represents a turn in the conversation. If provided, the model will generate responses based on these messages.

model_name

model_name*: str*

The name of the model to be used for generating tokens. This should match the available models on the server and determines the behavior and capabilities of the response generation.

prompt

prompt*: str | Sequence[int] | None* = None

The prompt to be processed by the model. This field supports legacy completion APIs and can accept either a string or a sequence of integers representing token IDs. If not provided, the model may generate output based on the messages field.

request_path

request_path*: str* = '/'

The endpoint path for the request. This is typically used for routing and logging requests within the server infrastructure.

response_format

response_format*: TokenGeneratorResponseFormat | None* = None

Specifies the desired format for the model’s output. When set, it enables structured generation, which adheres to the json_schema provided.

timestamp_ns

timestamp_ns*: int* = 0

The time (in nanoseconds) when the request was received by the server. This can be useful for performance monitoring and logging purposes.

tools

tools*: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestTool] | None* = None

A list of tools that can be invoked during the generation process. This allows the model to utilize external functionalities or APIs to enhance its responses.

TokenGeneratorRequestFunction

class max.pipelines.interfaces.TokenGeneratorRequestFunction

description

description*: str*

name

name*: str*

parameters

parameters*: dict*

TokenGeneratorRequestMessage

class max.pipelines.interfaces.TokenGeneratorRequestMessage

content

content*: str | list[dict[str, Any]]*

Content can be simple string or a list of message parts of different modalities.

For example:

{
"role": "user",
"content": "What'''s the weather like in Boston today?"
}
{
"role": "user",
"content": "What'''s the weather like in Boston today?"
}

Or:

{
"role": "user",
"content": [
{
"type": "text",
"text": "What'''s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
{
"role": "user",
"content": [
{
"type": "text",
"text": "What'''s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}

role

role*: Literal['system', 'user', 'assistant']*

TokenGeneratorRequestTool

class max.pipelines.interfaces.TokenGeneratorRequestTool

function

function*: TokenGeneratorRequestFunction*

type

type*: str*