Skip to main content

Python class

LLM

LLM

class max.entrypoints.llm.LLM(pipeline_config)

source

Bases: object

A high level interface for interacting with LLMs.

Parameters:

pipeline_config (PipelineConfig)

generate()

generate(prompts, max_new_tokens=100, use_tqdm=True)

source

Generates text completions for the given prompts.

This method is thread safe and may be used on the same LLM instance from multiple threads concurrently with no external synchronization.

Parameters:

  • prompts (str | Sequence[str]) – The input string or list of strings to generate completions for.
  • max_new_tokens (int | None) – The maximum number of tokens to generate in the response.
  • use_tqdm (bool) – Whether to display a progress bar during generation.

Returns:

A list of generated text completions corresponding to each input prompt.

Raises:

  • ValueError – If prompts is empty or contains invalid data.
  • RuntimeError – If the model fails to generate completions.

Return type:

Sequence[str]