Python class

SupportedArchitecture

`SupportedArchitecture`

class max.pipelines.lib.registry.SupportedArchitecture(name, example_repo_ids, default_encoding, supported_encodings, pipeline_model, task, tokenizer, default_weights_format, context_type, config, rope_type='none', weight_adapters=<factory>, multi_gpu_supported=False, input_modalities=<factory>, required_arguments=<factory>, context_validators=<factory>, supports_empty_batches=False, requires_max_batch_context_length=False, tool_parser=None)

source

Bases: object

Represents a model architecture configuration for MAX pipelines.

Defines the components and settings required to support a specific model architecture within the MAX pipeline system. Each SupportedArchitecture instance encapsulates the model implementation, tokenizer, supported encodings, and other architecture-specific configuration.

New architectures should be registered into the PipelineRegistry using the register() method.

Example:

my_architecture = SupportedArchitecture(
    name="MyModelForCausalLM",  # Must match your Hugging Face model class name
    example_repo_ids=[
        "your-org/your-model-name",  # Add example model repository IDs
    ],
    default_encoding="q4_k",
    supported_encodings={
        "q4_k",
        "bfloat16",
        # Add other encodings your model supports
    },
    pipeline_model=MyModel,
    tokenizer=TextTokenizer,
    context_type=TextContext,
    config=MyModelConfig,  # Architecture-specific config class
    default_weights_format=WeightsFormat.safetensors,
    rope_type="none",
    weight_adapters={
        WeightsFormat.safetensors: weight_adapters.convert_safetensor_state_dict,
        # Add other weight formats if needed
    },
    multi_gpu_supported=True,  # Set based on your implementation capabilities
    required_arguments={"some_arg": True},
    task=PipelineTask.TEXT_GENERATION,
)

Parameters:

name (str)
example_repo_ids (list[str])
default_encoding (SupportedEncoding)
supported_encodings (set[SupportedEncoding])
pipeline_model (PipelineModelType)
task (PipelineTask)
tokenizer (Callable[..., PipelineTokenizer[Any, Any, Any]])
default_weights_format (WeightsFormat)
context_type (type[TextGenerationContext] | type[EmbeddingsContext])
config (type[ArchConfig])
rope_type (RopeType)
weight_adapters (dict[WeightsFormat, WeightsAdapter])
multi_gpu_supported (bool)
input_modalities (set[InputModality])
required_arguments (dict[str, bool | int | float])
context_validators (list[Callable[[TextContext | TextAndVisionContext | PixelContext], None]])
supports_empty_batches (bool)
requires_max_batch_context_length (bool)
tool_parser (type[ToolParser] | None)

`config`

config: type[ArchConfig]

source

The architecture-specific configuration class for the model.

This class must implement the ArchConfig protocol, providing an initialize method that creates a configuration instance from a PipelineConfig. For models with KV cache, this should be a class implementing ArchConfigWithKVCache to enable KV cache memory estimation.

`context_type`

context_type: type[TextGenerationContext] | type[EmbeddingsContext]

source

The context class type that this architecture uses for managing request state and inputs.

This should be a class (not an instance) that implements either the TextGenerationContext or EmbeddingsContext protocol, defining how the pipeline processes and tracks requests.

`context_validators`

context_validators: list[Callable[[TextContext | TextAndVisionContext | PixelContext], None]]

source

A list of callable validators that verify context inputs before model execution.

These validators are called during context creation to ensure inputs meet model-specific requirements. Validators should raise InputError for invalid inputs, providing early error detection before expensive model operations.

def validate_single_image(context: TextContext | TextAndVisionContext) -> None:
    if isinstance(context, TextAndVisionContext):
        if context.pixel_values and len(context.pixel_values) > 1:
            raise InputError(f"Model supports only 1 image, got {len(context.pixel_values)}")

my_architecture = SupportedArchitecture(
    # ... other fields ...
    context_validators=[validate_single_image],
)

`default_encoding`

default_encoding: SupportedEncoding

source

The default quantization encoding to use when no specific encoding is requested.

`default_weights_format`

default_weights_format: WeightsFormat

source

The weights format expected by the pipeline_model.

`example_repo_ids`

example_repo_ids: list[str]

source

A list of Hugging Face repository IDs that use this architecture for testing and validation purposes.

`input_modalities`

input_modalities: set[InputModality]

source

The set of input modalities this architecture accepts.

Defaults to text-only. Multimodal architectures should declare all supported input types explicitly, e.g. {InputModality.TEXT, InputModality.IMAGE} for vision-language models.

`multi_gpu_supported`

multi_gpu_supported: bool = False

source

Whether the architecture supports multi-GPU execution.

`name`

name: str

source

The name of the model architecture that must match the Hugging Face model class name.

`pipeline_model`

pipeline_model: PipelineModelType

source

The model class that defines the graph structure and execution logic.

Accepts either a PipelineModel subclass (for LLM and other token-generation architectures) or a PipelineExecutor subclass (for newer executor-based architectures such as diffusion pipelines).

`required_arguments`

required_arguments: dict[str, bool | int | float]

source

A dictionary specifying required values for PipelineConfig options.

`requires_max_batch_context_length`

requires_max_batch_context_length: bool = False

source

Whether the architecture requires a max batch context length to be specified.

If True and max_batch_context_length is not specified, we will default to the max sequence length of the model.

`rope_type`

rope_type: RopeType = 'none'

source

The type of RoPE (Rotary Position Embedding) used by the model.

`supported_encodings`

supported_encodings: set[SupportedEncoding]

source

A dictionary of supported quantization encodings.

`supports_empty_batches`

supports_empty_batches: bool = False

source

Whether the architecture can handle empty batches during inference.

When set to True, the pipeline can process requests with zero-sized batches without errors. This is useful for certain execution modes and expert parallelism. Most architectures do not require empty batch support and should leave this as False.

`task`

task: PipelineTask

source

The pipeline task type that this architecture supports.

`tokenizer`

tokenizer: Callable[..., PipelineTokenizer[Any, Any, Any]]

source

A callable that returns a PipelineTokenizer instance for preprocessing model inputs.

`tokenizer_cls`

property tokenizer_cls: type[PipelineTokenizer[Any, Any, Any]]

source

Returns the tokenizer class for this architecture.

`tool_parser`

tool_parser: type[ToolParser] | None = None

source

Optional tool parser class for parsing tool calls from model responses.

When set, the serving layer will use this parser to extract tool calls from the model’s output. Different model architectures may use different tool calling formats (e.g., Llama uses JSON, Kimi K2.5 uses structural tags).

If None, the default LlamaToolParser will be used.

`weight_adapters`

weight_adapters: dict[WeightsFormat, WeightsAdapter]

source

A dictionary of weight format adapters for converting checkpoints from different formats to the default format.

SupportedArchitecture​

config​

context_type​

context_validators​

default_encoding​

default_weights_format​

example_repo_ids​

input_modalities​

multi_gpu_supported​

name​

pipeline_model​

required_arguments​

requires_max_batch_context_length​

rope_type​

supported_encodings​

supports_empty_batches​

task​

tokenizer​

tokenizer_cls​

tool_parser​

weight_adapters​