Skip to main content
Log in

Python module

registry

Model registry, for tracking various model variants.

PipelineRegistry

class max.pipelines.lib.registry.PipelineRegistry(architectures)

Parameters:

architectures (list [ SupportedArchitecture ] )

get_active_huggingface_config()

get_active_huggingface_config(huggingface_repo)

Retrieves or creates a cached HuggingFace AutoConfig for the given model configuration.

This method maintains a cache of HuggingFace configurations to avoid reloading them unnecessarily which incurs a huggingface hub API call. If a config for the given model hasn’t been loaded before, it will create a new one using AutoConfig.from_pretrained() with the model’s settings.

Parameters:

huggingface_repo (HuggingFaceRepo ) – The HuggingFaceRepo containing the model.

Returns:

The HuggingFace configuration object for the model.

Return type:

AutoConfig

get_active_tokenizer()

get_active_tokenizer(huggingface_repo)

Retrieves or creates a cached HuggingFace AutoTokenizer for the given model configuration.

This method maintains a cache of HuggingFace tokenizers to avoid reloading them unnecessarily which incurs a huggingface hub API call. If a tokenizer for the given model hasn’t been loaded before, it will create a new one using AutoTokenizer.from_pretrained() with the model’s settings.

Parameters:

huggingface_repo (HuggingFaceRepo ) – The HuggingFaceRepo containing the model.

Returns:

The HuggingFace tokenizer for the model.

Return type:

PreTrainedTokenizer | PreTrainedTokenizerFast

register()

register(architecture, *, allow_override=False)

Add new architecture to registry.

Parameters:

Return type:

None

reset()

reset()

Return type:

None

retrieve()

retrieve(pipeline_config, task=PipelineTask.TEXT_GENERATION, override_architecture=None)

Parameters:

Return type:

tuple[PipelineTokenizer, PipelineTypes]

retrieve_architecture()

retrieve_architecture(huggingface_repo)

Parameters:

huggingface_repo (HuggingFaceRepo )

Return type:

SupportedArchitecture | None

retrieve_factory()

retrieve_factory(pipeline_config, task=PipelineTask.TEXT_GENERATION, override_architecture=None)

Parameters:

Return type:

tuple[PipelineTokenizer, Callable[[], PipelineTypes]]

SupportedArchitecture

class max.pipelines.lib.registry.SupportedArchitecture(name, example_repo_ids, default_encoding, supported_encodings, pipeline_model, task, tokenizer, default_weights_format, multi_gpu_supported=False, rope_type=RopeType.none, weight_adapters=None)

Initializes a model architecture supported by MAX pipelines.

New architectures should be registered into the PipelineRegistry.

Parameters:

  • name (str ) – Architecture name.
  • example_repo_ids (list [ str ] ) – Hugging Face repo_id which runs this architecture.
  • default_encoding (SupportedEncoding ) – Default encoding for the model.
  • supported_encodings (dict [ SupportedEncoding , list [ KVCacheStrategy ] ] ) – Alternate encodings supported.
  • pipeline_model (type [ PipelineModel ] ) – PipelineModel class that defines the model graph and execution.
  • task (PipelineTask ) – Which pipeline task should the model run with.
  • tokenizer (Callable [ ... , PipelineTokenizer ] ) – Tokenizer used to preprocess model inputs.
  • default_weights_format (WeightsFormat ) – The weights format used in pipeline_model.
  • weight_converters – A dictionary of weight loaders to use if the input checkpoint has a different format than the default.
  • multi_gpu_supported (bool )
  • rope_type (RopeType )
  • weight_adapters (dict [ WeightsFormat , WeightsAdapter ] | None )

tokenizer_cls

property tokenizer_cls*: type[PipelineTokenizer]*

get_pipeline_for_task()

max.pipelines.lib.registry.get_pipeline_for_task(task, pipeline_config)

Parameters:

Return type:

type[TextGenerationPipeline] | type[EmbeddingsPipeline] | type[SpeculativeDecodingTextGenerationPipeline] | type[AudioGeneratorPipeline] | type[SpeechTokenGenerationPipeline]