Python class
TextGenerationPipeline
TextGenerationPipeline
class max.pipelines.TextGenerationPipeline(pipeline_config, pipeline_model, eos_token_id, weight_adapters, tokenizer)
Bases: TextGenerationPipelineInterface[TextGenerationContextType], Generic[TextGenerationContextType]
Generalized token generator pipeline.
Initialize a text generation pipeline instance.
This sets up devices, the inference session, tokenizer, KV-cache manager, sampling kernel, and loads model weights and adapters.
-
Parameters:
-
- pipeline_config (PipelineConfig) – Configuration for the pipeline and runtime behavior.
- pipeline_model (type[PipelineModel[TextGenerationContextType]]) – Concrete model implementation to use for execution.
- eos_token_id (int) – Default EOS token id used when HF config does not supply one or to seed the EOS set.
- weight_adapters (dict[WeightsFormat, WeightsAdapter]) – Mapping from weights format to adapter implementation.
- tokenizer (PipelineTokenizer[TextGenerationContextType, npt.NDArray[np.integer[Any]], TextGenerationRequest]) – Tokenizer implementation used to build contexts and decode.
-
Raises:
-
ValueError – If
quantization_encodingis not configured inpipeline_config.modelor if structured output is requested without a valid tokenizer delegate.
execute()
execute(inputs)
Processes the batch and returns decoded tokens.
Given a batch, executes the graph for num_steps in a multi-step scenario, then decodes the tokens and returns the list of decoded tokens.
-
Parameters:
-
inputs (TextGenerationInputs[TextGenerationContextType])
-
Return type:
initialize_bitmask()
initialize_bitmask(batch)
Allocates a per-request token bitmask for structured decoding.
kv_manager
property kv_manager: PagedKVCacheManager
Returns the KV cache manager for this pipeline.
pipeline_config
property pipeline_config: PipelineConfig
Return the pipeline configuration.
prepare_batch()
prepare_batch(batches, num_steps)
Prepare model inputs and ancillary state for multi-step execution.
This flattens replica batches, optionally initializes constrained
decoding bitmasks, ensures KV-cache reservations, clamps num_steps
per context, and builds initial model inputs.
-
Parameters:
-
Returns:
-
- ModelInputs: Prepared inputs for the first step.
- int: The clamped number of steps to run.
- Optional[np.ndarray]: The structured decoding bitmask or None.
- list[TextGenerationContextType]: The flattened context batch.
-
Return type:
-
A tuple of
release()
release(request_id)
Release model-specific resources for a completed request.
Primary and extra KV cache lifecycle is managed by the batch constructor. This method handles model-specific cleanup only (e.g. vision encoder cache).
-
Parameters:
-
request_id (RequestID)
-
Return type:
-
None
tokenizer
property tokenizer: PipelineTokenizer[TextGenerationContextType, ndarray[tuple[Any, ...], dtype[integer[Any]]], TextGenerationRequest]
Return the tokenizer used for building contexts and decoding.
update_for_structured_output()
update_for_structured_output(context, bitmask, index)
Update context and logits bitmask for structured output.
If a json_schema is present and no matcher is set, this compiles a
grammar matcher and installs it on the context. It may also jump ahead in
generation and fills the per-request token bitmask used to constrain the
next-token distribution.
-
Parameters:
-
Raises:
-
ValueError – If a JSON schema is provided but structured output is not enabled via sampling configuration.
-
Return type:
-
None
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!