Python module

interfaces

Top level imports for pipeline interfaces.

`EmbeddingsGenerator`

class max.pipelines.interfaces.EmbeddingsGenerator(*args, **kwargs)

Interface for LLM embeddings-generator models.

`encode()`

encode(batch: dict[str, EmbeddingsGeneratorContext]) → dict[str, Any]

Computes embeddings for a batch of inputs.

Parameters:

batch (dict[str, EmbeddingsGeneratorContext]) – Batch of contexts to generate embeddings for.
Returns:

Dictionary mapping request IDs to their corresponding : embeddings. Each embedding is typically a numpy array or tensor of floating point values.
Return type:

dict[str, Any]

`EmbeddingsResponse`

class max.pipelines.interfaces.EmbeddingsResponse(embeddings: ndarray)

Container for the response from embeddings pipeline.

`embeddings`

embeddings*: ndarray*

`LogProbabilities`

class max.pipelines.interfaces.LogProbabilities(token_log_probabilities: list[float], top_log_probabilities: list[dict[int, float]])

Log probabilities for an individual output token.

`token_log_probabilities`

token_log_probabilities

Probabilities of each token.

Type:

list[float]

`top_log_probabilities`

top_log_probabilities

Top tokens and their corresponding probabilities.

Type:

list[dict[int, float]]

`PipelineTask`

class max.pipelines.interfaces.PipelineTask(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

`EMBEDDINGS_GENERATION`

EMBEDDINGS_GENERATION = 'embeddings_generation'

`TEXT_GENERATION`

TEXT_GENERATION = 'text_generation'

`PipelineTokenizer`

class max.pipelines.interfaces.PipelineTokenizer(*args, **kwargs)

Interface for LLM tokenizers.

`decode()`

async decode(context: TokenGeneratorContext, encoded: TokenizerEncoded, **kwargs) → str

Decodes response tokens to text.

Parameters:
- context (TokenGeneratorContext) – Current generation context.
- encoded (TokenizerEncoded) – Encoded response tokens.
Returns:

Un-encoded response text.
Return type:

str

`encode()`

async encode(prompt: str, add_special_tokens: bool) → TokenizerEncoded

Encodes text prompts as tokens.

Parameters:

prompt (str) – Un-encoded prompt text.
Raises:

ValueError – If the prompt exceeds the configured maximum length.

`eos`

property eos*: int*

The end of sequence token for this tokenizer.

`expects_content_wrapping`

property expects_content_wrapping*: bool*

If true, this tokenizer expects messages to have a content property.

Text messages are formatted as:

{ "type": "text", "content": "text content" }
{ "type": "text", "content": "text content" }

instead of the OpenAI spec:

{ "type": "text", "text": "text content" }
{ "type": "text", "text": "text content" }

NOTE: Multimodal messages omit the content property. Both image_urls and image content parts are converted to:

{ "type": "image" }
{ "type": "image" }

Their content is provided as byte arrays through the top-level property on the request object, i.e., TokenGeneratorRequest.images.

`new_context()`

async new_context(request: TokenGeneratorRequest) → TokenGeneratorContext

Creates a new context from a request object. This is sent to the worker process once and then cached locally.

Parameters:

request (TokenGeneratorRequest) – Incoming request.
Returns:

Initialized context.
Return type:

TokenGeneratorContext

`TextGenerationResponse`

class max.pipelines.interfaces.TextGenerationResponse(tokens: list[max.pipelines.interfaces.response.TextResponse], final_status: TextGenerationStatus)

`append_token()`

append_token(token: TextResponse) → None

`final_status`

property final_status*: TextGenerationStatus*

`is_done`

property is_done*: bool*

`tokens`

property tokens*: list[max.pipelines.interfaces.response.TextResponse]*

`update_status()`

update_status(status: TextGenerationStatus) → None

`TextGenerationStatus`

class max.pipelines.interfaces.TextGenerationStatus(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

`ACTIVE`

ACTIVE = 'active'

`END_OF_SEQUENCE`

END_OF_SEQUENCE = 'end_of_sequence'

`MAXIMUM_LENGTH`

MAXIMUM_LENGTH = 'maximum_length'

`is_done`

property is_done*: bool*

`TextResponse`

class max.pipelines.interfaces.TextResponse(next_token: int | str, log_probabilities: LogProbabilities | None = None)

A base class for model response, specifically for Text model variants.

`next_token`

next_token

Encoded predicted next token.

Type:

int | str

`log_probabilities`

log_probabilities

Log probabilities of each output token.

Type:

LogProbabilities | None

`TokenGenerator`

class max.pipelines.interfaces.TokenGenerator(*args, **kwargs)

Interface for LLM token-generator models.

`next_token()`

next_token(batch: dict[str, TokenGeneratorContext], num_steps: int) → dict[str, max.pipelines.interfaces.response.TextGenerationResponse]

Computes the next token response for a single batch.

Parameters:
- batch (dict[str, TokenGeneratorContext]) – Batch of contexts.
- int (num_steps) – Number of tokens to generate.
Returns:

List of encoded responses (indexed by req. ID)
Return type:

list[dict[str, TextResponse]]

`release()`

release(context: TokenGeneratorContext) → None

Releases resources associated with this context.

Parameters:

context (TokenGeneratorContext) – Finished context.

`TokenGeneratorRequest`

class max.pipelines.interfaces.TokenGeneratorRequest(id: str, index: int, model_name: str, prompt: str | Sequence[int] | NoneType = None, messages: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestMessage] | None = None, images: list[bytes] | None = None, tools: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestTool] | None = None, response_format: max.pipelines.interfaces.text_generation.TokenGeneratorResponseFormat | None = None, max_new_tokens: int | None = None, timestamp_ns: int = 0, request_path: str = '/', logprobs: int = 0, echo: bool = False, stop: str | List[str] | NoneType = None, ignore_eos: bool = False)

`echo`

echo*: bool* = False

If set to True, the response will include the original prompt along with the generated output. This can be useful for debugging or when you want to see how the input relates to the output.

`id`

id*: str*

A unique identifier for the request. This ID can be used to trace and log the request throughout its lifecycle, facilitating debugging and tracking.

`ignore_eos`

ignore_eos*: bool* = False

If set to True, the response will ignore the EOS token, and continue to generate until the Max tokens or a stop string is hit.

`images`

images*: list[bytes] | None* = None

A list of image byte arrays that can be included as part of the request. This field is optional and may be used for multimodal inputs where images are relevant to the prompt or task.

`index`

index*: int*

The sequence order of this request within a batch. This is useful for maintaining the order of requests when processing multiple requests simultaneously, ensuring that responses can be matched back to their corresponding requests accurately.

`logprobs`

logprobs*: int* = 0

The number of top log probabilities to return for each generated token. A value of 0 means that log probabilities will not be returned. Useful for analyzing model confidence in its predictions.

`max_new_tokens`

max_new_tokens*: int | None* = None

The maximum number of new tokens to generate in the response. If not set, the model may generate tokens until it reaches its internal limits or based on other stopping criteria.

`messages`

messages*: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestMessage] | None* = None

A list of messages for chat-based interactions. This is used in chat completion APIs, where each message represents a turn in the conversation. If provided, the model will generate responses based on these messages.

`model_name`

model_name*: str*

The name of the model to be used for generating tokens. This should match the available models on the server and determines the behavior and capabilities of the response generation.

`prompt`

prompt*: str | Sequence[int] | None* = None

The prompt to be processed by the model. This field supports legacy completion APIs and can accept either a string or a sequence of integers representing token IDs. If not provided, the model may generate output based on the messages field.

`request_path`

request_path*: str* = '/'

The endpoint path for the request. This is typically used for routing and logging requests within the server infrastructure.

`response_format`

response_format*: TokenGeneratorResponseFormat | None* = None

Specifies the desired format for the model’s output. When set, it enables structured generation, which adheres to the json_schema provided.

`stop`

stop*: str | List[str] | None* = None

//platform.openai.com/docs/api-reference/chat/create#chat-create-stop)

Type:

Optional list of stop expressions (see
Type:

https

`timestamp_ns`

timestamp_ns*: int* = 0

The time (in nanoseconds) when the request was received by the server. This can be useful for performance monitoring and logging purposes.

`tools`

tools*: list[max.pipelines.interfaces.text_generation.TokenGeneratorRequestTool] | None* = None

A list of tools that can be invoked during the generation process. This allows the model to utilize external functionalities or APIs to enhance its responses.

`TokenGeneratorRequestFunction`

class max.pipelines.interfaces.TokenGeneratorRequestFunction

`description`

description*: str*

`name`

name*: str*

`parameters`

parameters*: dict*

`TokenGeneratorRequestMessage`

class max.pipelines.interfaces.TokenGeneratorRequestMessage

`content`

content*: str | list[dict[str, Any]]*

Content can be simple string or a list of message parts of different modalities.

For example:

{
  "role": "user",
  "content": "What'''s the weather like in Boston today?"
}
{
  "role": "user",
  "content": "What'''s the weather like in Boston today?"
}

Or:

{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "What'''s in this image?"
    },
    {
      "type": "image_url",
      "image_url": {
          "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
      }
    }
  ]
}
{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "What'''s in this image?"
    },
    {
      "type": "image_url",
      "image_url": {
          "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
      }
    }
  ]
}

`role`

role*: Literal['system', 'user', 'assistant']*

`TokenGeneratorRequestTool`

class max.pipelines.interfaces.TokenGeneratorRequestTool

`function`

function*: TokenGeneratorRequestFunction*

`type`

type*: str*

`TokenGeneratorResponseFormat`

class max.pipelines.interfaces.TokenGeneratorResponseFormat

`json_schema`

json_schema*: dict*

`type`

type*: str*

Was this page helpful?

Thank you! We'll create more content like this.

Thank you for helping us improve!

EmbeddingsGenerator​

encode()​

EmbeddingsResponse​

embeddings​

LogProbabilities​

token_log_probabilities​

top_log_probabilities​

PipelineTask​

EMBEDDINGS_GENERATION​

TEXT_GENERATION​

PipelineTokenizer​

decode()​

encode()​

eos​

expects_content_wrapping​

new_context()​

TextGenerationResponse​

append_token()​

final_status​

is_done​

tokens​

update_status()​

TextGenerationStatus​

ACTIVE​

END_OF_SEQUENCE​

MAXIMUM_LENGTH​

is_done​

TextResponse​

next_token​

log_probabilities​

TokenGenerator​

next_token()​

release()​

TokenGeneratorRequest​

echo​

id​

ignore_eos​

images​

index​

logprobs​

max_new_tokens​

messages​

model_name​

prompt​

request_path​

response_format​

stop​

timestamp_ns​

tools​

TokenGeneratorRequestFunction​

description​

name​

parameters​

TokenGeneratorRequestMessage​

content​

role​

TokenGeneratorRequestTool​

function​

type​

TokenGeneratorResponseFormat​

json_schema​

type​

`EmbeddingsGenerator`

`encode()`

`EmbeddingsResponse`

`embeddings`

`LogProbabilities`

`token_log_probabilities`

`top_log_probabilities`

`PipelineTask`

`EMBEDDINGS_GENERATION`

`TEXT_GENERATION`

`PipelineTokenizer`

`decode()`

`encode()`

`eos`

`expects_content_wrapping`

`new_context()`

`TextGenerationResponse`

`append_token()`

`final_status`

`is_done`

`tokens`

`update_status()`

`TextGenerationStatus`

`ACTIVE`

`END_OF_SEQUENCE`

`MAXIMUM_LENGTH`

`is_done`

`TextResponse`

`next_token`

`log_probabilities`

`TokenGenerator`

`next_token()`

`release()`

`TokenGeneratorRequest`

`echo`

`id`

`ignore_eos`

`images`

`index`

`logprobs`

`max_new_tokens`

`messages`

`model_name`

`prompt`

`request_path`

`response_format`

`stop`

`timestamp_ns`

`tools`

`TokenGeneratorRequestFunction`

`description`

`name`

`parameters`

`TokenGeneratorRequestMessage`

`content`

`role`

`TokenGeneratorRequestTool`

`function`

`type`

`TokenGeneratorResponseFormat`

`json_schema`

`type`