Python class

TextGenerationPipeline

`TextGenerationPipeline`

class max.pipelines.TextGenerationPipeline(pipeline_config, pipeline_model, eos_token_id, weight_adapters, tokenizer)

source

Bases: TextGenerationPipelineInterface[TextGenerationContextType], Generic[TextGenerationContextType]

Generalized token generator pipeline.

Initialize a text generation pipeline instance.

This sets up devices, the inference session, tokenizer, KV-cache manager, sampling kernel, and loads model weights and adapters.

Parameters:

pipeline_config (PipelineConfig) – Configuration for the pipeline and runtime behavior.
pipeline_model (type[PipelineModel[TextGenerationContextType]]) – Concrete model implementation to use for execution.
eos_token_id (int) – Default EOS token id used when HF config does not supply one or to seed the EOS set.
weight_adapters (dict[WeightsFormat, WeightsAdapter]) – Mapping from weights format to adapter implementation.
tokenizer (PipelineTokenizer[TextGenerationContextType, npt.NDArray[np.integer[Any]], TextGenerationRequest]) – Tokenizer implementation used to build contexts and decode.

Raises:

ValueError – If quantization_encoding is not configured in pipeline_config.model or if structured output is requested without a valid tokenizer delegate.

`execute()`

execute(inputs)

source

Processes the batch and returns decoded tokens.

Given a batch, executes the graph for num_steps in a multi-step scenario, then decodes the tokens and returns the list of decoded tokens.

Parameters:: inputs (TextGenerationInputs[TextGenerationContextType])
Return type:: dict[RequestID, TextGenerationOutput]

`initialize_bitmask()`

initialize_bitmask(batch)

source

Allocates a per-request token bitmask for structured decoding.

Parameters:: batch (list[TextGenerationContextType]) – The generation contexts for the batch.
Returns:: A bitmask array of shape [batch_size, vocab_size] if structured output is enabled; otherwise None.
Return type:: ndarray[tuple[Any, …], dtype[int32]] | None

`kv_manager`

property kv_manager: PagedKVCacheManager

source

Returns the KV cache manager for this pipeline.

`pipeline_config`

property pipeline_config: PipelineConfig

source

Return the pipeline configuration.

`prepare_batch()`

prepare_batch(batches, num_steps)

source

Prepare model inputs and ancillary state for multi-step execution.

This flattens replica batches, optionally initializes constrained decoding bitmasks, ensures KV-cache reservations, clamps num_steps per context, and builds initial model inputs.

Parameters:

batches (list[list[TextGenerationContextType]]) – Per-replica list of contexts.
num_steps (int) – Desired number of steps to run.

Returns:

ModelInputs: Prepared inputs for the first step.
int: The clamped number of steps to run.
Optional[np.ndarray]: The structured decoding bitmask or None.
list[TextGenerationContextType]: The flattened context batch.

Return type:

A tuple of

`release()`

release(request_id)

source

Release model-specific resources for a completed request.

Primary and extra KV cache lifecycle is managed by the batch constructor. This method handles model-specific cleanup only (e.g. vision encoder cache).

Parameters:: request_id (RequestID)
Return type:: None

`tokenizer`

property tokenizer: PipelineTokenizer[TextGenerationContextType, ndarray[tuple[Any, ...], dtype[integer[Any]]], TextGenerationRequest]

source

Return the tokenizer used for building contexts and decoding.

`update_for_structured_output()`

update_for_structured_output(context, bitmask, index)

source

Update context and logits bitmask for structured output.

If a json_schema is present and no matcher is set, this compiles a grammar matcher and installs it on the context. It may also jump ahead in generation and fills the per-request token bitmask used to constrain the next-token distribution.

Parameters:

context (TextGenerationContextType) – Request context to update.
bitmask (ndarray[tuple[Any, ...], dtype[int32]]) – Optional preallocated bitmask buffer; updated in-place.
index (int) – Global position into the bitmask for this request.

Raises:

ValueError – If a JSON schema is provided but structured output is not enabled via sampling configuration.

Return type:

None

TextGenerationPipeline​

execute()​

initialize_bitmask()​

kv_manager​

pipeline_config​

prepare_batch()​

release()​

tokenizer​

update_for_structured_output()​

`TextGenerationPipeline`

`execute()`

`initialize_bitmask()`

`kv_manager`

`pipeline_config`

`prepare_batch()`

`release()`

`tokenizer`

`update_for_structured_output()`