Skip to main content

Python class

SupportedArchitecture

SupportedArchitecture

class max.pipelines.lib.registry.SupportedArchitecture(name, example_repo_ids, default_encoding, supported_encodings, pipeline_model, task, tokenizer, default_weights_format, context_type, config, rope_type='none', weight_adapters=<factory>, multi_gpu_supported=False, input_modalities=<factory>, required_arguments=<factory>, context_validators=<factory>, supports_empty_batches=False, requires_max_batch_context_length=False, tool_parser=None)

source

Bases: object

Represents a model architecture configuration for MAX pipelines.

Defines the components and settings required to support a specific model architecture within the MAX pipeline system. Each SupportedArchitecture instance encapsulates the model implementation, tokenizer, supported encodings, and other architecture-specific configuration.

New architectures should be registered into the PipelineRegistry using the register() method.

Example:

my_architecture = SupportedArchitecture(
    name="MyModelForCausalLM",  # Must match your Hugging Face model class name
    example_repo_ids=[
        "your-org/your-model-name",  # Add example model repository IDs
    ],
    default_encoding="q4_k",
    supported_encodings={
        "q4_k",
        "bfloat16",
        # Add other encodings your model supports
    },
    pipeline_model=MyModel,
    tokenizer=TextTokenizer,
    context_type=TextContext,
    config=MyModelConfig,  # Architecture-specific config class
    default_weights_format=WeightsFormat.safetensors,
    rope_type="none",
    weight_adapters={
        WeightsFormat.safetensors: weight_adapters.convert_safetensor_state_dict,
        # Add other weight formats if needed
    },
    multi_gpu_supported=True,  # Set based on your implementation capabilities
    required_arguments={"some_arg": True},
    task=PipelineTask.TEXT_GENERATION,
)

Parameters:

config

config: type[ArchConfig]

source

The architecture-specific configuration class for the model.

This class must implement the ArchConfig protocol, providing an initialize method that creates a configuration instance from a PipelineConfig. For models with KV cache, this should be a class implementing ArchConfigWithKVCache to enable KV cache memory estimation.

context_type

context_type: type[TextGenerationContext] | type[EmbeddingsContext]

source

The context class type that this architecture uses for managing request state and inputs.

This should be a class (not an instance) that implements either the TextGenerationContext or EmbeddingsContext protocol, defining how the pipeline processes and tracks requests.

context_validators

context_validators: list[Callable[[TextContext | TextAndVisionContext | PixelContext], None]]

source

A list of callable validators that verify context inputs before model execution.

These validators are called during context creation to ensure inputs meet model-specific requirements. Validators should raise InputError for invalid inputs, providing early error detection before expensive model operations.

def validate_single_image(context: TextContext | TextAndVisionContext) -> None:
    if isinstance(context, TextAndVisionContext):
        if context.pixel_values and len(context.pixel_values) > 1:
            raise InputError(f"Model supports only 1 image, got {len(context.pixel_values)}")

my_architecture = SupportedArchitecture(
    # ... other fields ...
    context_validators=[validate_single_image],
)

default_encoding

default_encoding: SupportedEncoding

source

The default quantization encoding to use when no specific encoding is requested.

default_weights_format

default_weights_format: WeightsFormat

source

The weights format expected by the pipeline_model.

example_repo_ids

example_repo_ids: list[str]

source

A list of Hugging Face repository IDs that use this architecture for testing and validation purposes.

input_modalities

input_modalities: set[InputModality]

source

The set of input modalities this architecture accepts.

Defaults to text-only. Multimodal architectures should declare all supported input types explicitly, e.g. {InputModality.TEXT, InputModality.IMAGE} for vision-language models.

multi_gpu_supported

multi_gpu_supported: bool = False

source

Whether the architecture supports multi-GPU execution.

name

name: str

source

The name of the model architecture that must match the Hugging Face model class name.

pipeline_model

pipeline_model: PipelineModelType

source

The model class that defines the graph structure and execution logic.

Accepts either a PipelineModel subclass (for LLM and other token-generation architectures) or a PipelineExecutor subclass (for newer executor-based architectures such as diffusion pipelines).

required_arguments

required_arguments: dict[str, bool | int | float]

source

A dictionary specifying required values for PipelineConfig options.

requires_max_batch_context_length

requires_max_batch_context_length: bool = False

source

Whether the architecture requires a max batch context length to be specified.

If True and max_batch_context_length is not specified, we will default to the max sequence length of the model.

rope_type

rope_type: RopeType = 'none'

source

The type of RoPE (Rotary Position Embedding) used by the model.

supported_encodings

supported_encodings: set[SupportedEncoding]

source

A dictionary of supported quantization encodings.

supports_empty_batches

supports_empty_batches: bool = False

source

Whether the architecture can handle empty batches during inference.

When set to True, the pipeline can process requests with zero-sized batches without errors. This is useful for certain execution modes and expert parallelism. Most architectures do not require empty batch support and should leave this as False.

task

task: PipelineTask

source

The pipeline task type that this architecture supports.

tokenizer

tokenizer: Callable[..., PipelineTokenizer[Any, Any, Any]]

source

A callable that returns a PipelineTokenizer instance for preprocessing model inputs.

tokenizer_cls

property tokenizer_cls: type[PipelineTokenizer[Any, Any, Any]]

source

Returns the tokenizer class for this architecture.

tool_parser

tool_parser: type[ToolParser] | None = None

source

Optional tool parser class for parsing tool calls from model responses.

When set, the serving layer will use this parser to extract tool calls from the model’s output. Different model architectures may use different tool calling formats (e.g., Llama uses JSON, Kimi K2.5 uses structural tags).

If None, the default LlamaToolParser will be used.

weight_adapters

weight_adapters: dict[WeightsFormat, WeightsAdapter]

source

A dictionary of weight format adapters for converting checkpoints from different formats to the default format.