IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

SupportedArchitecture

SupportedArchitecture​

class max.pipelines.lib.registry.SupportedArchitecture(name, example_repo_ids, default_encoding, supported_encodings, pipeline_model, task, tokenizer, default_weights_format, context_type, config, rope_type='none', weight_adapters=<factory>, multi_gpu_supported=False, input_modalities=<factory>, required_arguments=<factory>, context_validators=<factory>, supports_empty_batches=False, requires_max_batch_context_length=False, tool_parser=None, reasoning_parser=None, memory_planner=None)

source

Bases: object

Represents a model architecture configuration for MAX pipelines.

Defines the components and settings required to support a specific model architecture within the MAX pipeline system. Each SupportedArchitecture instance encapsulates the model implementation, tokenizer, supported encodings, and other architecture-specific configuration.

New architectures should be registered into the PipelineRegistry using the register() method.

Example:

my_architecture = SupportedArchitecture(
    name="MyModelForCausalLM",  # Must match your Hugging Face model class name
    example_repo_ids=[
        "your-org/your-model-name",  # Add example model repository IDs
    ],
    default_encoding="q4_k",
    supported_encodings={
        "q4_k",
        "bfloat16",
        # Add other encodings your model supports
    },
    pipeline_model=MyModel,
    tokenizer=TextTokenizer,
    context_type=TextContext,
    config=MyModelConfig,  # Architecture-specific config class
    default_weights_format=WeightsFormat.safetensors,
    rope_type="none",
    weight_adapters={
        WeightsFormat.safetensors: weight_adapters.convert_safetensor_state_dict,
        # Add other weight formats if needed
    },
    multi_gpu_supported=True,  # Set based on your implementation capabilities
    required_arguments={"some_arg": True},
    task=PipelineTask.TEXT_GENERATION,
)

Parameters:

config​

config: type[ArchConfig]

source

The architecture-specific configuration class for the model.

This class must implement the ArchConfig protocol, providing an initialize method that creates a configuration instance from a PipelineConfig. For models with KV cache, this should be a class implementing ArchConfigWithKVCache to enable KV cache memory estimation.

context_type​

context_type: type[TextContext] | type[EmbeddingsContext]

source

The context class type that this architecture uses for managing request state and inputs.

This should be a class (not an instance) that implements either the TextContext or EmbeddingsContext protocol, defining how the pipeline processes and tracks requests.

context_validators​

context_validators: list[Callable[[TextContext | TextAndVisionContext | PixelContext], None]]

source

A list of callable validators that verify context inputs before model execution.

These validators are called during context creation to ensure inputs meet model-specific requirements. Validators should raise InputError for invalid inputs, providing early error detection before expensive model operations.

def validate_single_image(context: TextContext | TextAndVisionContext) -> None:
    if isinstance(context, TextAndVisionContext):
        if context.pixel_values and len(context.pixel_values) > 1:
            raise InputError(f"Model supports only 1 image, got {len(context.pixel_values)}")

my_architecture = SupportedArchitecture(
    # ... other fields ...
    context_validators=[validate_single_image],
)

default_encoding​

default_encoding: SupportedEncoding

source

The default quantization encoding to use when no specific encoding is requested.

default_weights_format​

default_weights_format: WeightsFormat

source

The weights format expected by the pipeline_model.

example_repo_ids​

example_repo_ids: list[str]

source

A list of Hugging Face repository IDs that use this architecture for testing and validation purposes.

input_modalities​

input_modalities: set[InputModality]

source

The set of input modalities this architecture accepts.

Defaults to text-only. Multimodal architectures should declare all supported input types explicitly, e.g. {InputModality.TEXT, InputModality.IMAGE} for vision-language models.

memory_planner​

memory_planner: type[MemoryPlanner] | None = None

source

Optional MemoryPlanner subclass for this architecture.

When set, PipelineConfig uses the planner to estimate weight size, activation memory, signal-buffer memory, and vision cache entry bytes. Autoregressive text-generation models should set this to PagedMemoryPlanner (or a subclass with architecture-specific overrides).

None means the architecture manages its own memory estimation (e.g. diffusion pipelines that skip KV cache estimation entirely).

multi_gpu_supported​

multi_gpu_supported: bool = False

source

Whether the architecture supports multi-GPU execution.

name​

name: str

source

The name of the model architecture that must match the Hugging Face model class name.

pipeline_model​

pipeline_model: PipelineModelType

source

The model class that defines the graph structure and execution logic.

Accepts either a PipelineModel subclass (for LLM and other token-generation architectures) or a PipelineExecutor subclass (for newer executor-based architectures such as diffusion pipelines).

reasoning_parser​

reasoning_parser: str | None = None

source

Optional default reasoning parser name for this architecture.

The name must correspond to a parser registered via max.pipelines.lib.reasoning.register(). When set, the pipeline config will fall back to this value for runtime.reasoning_parser if the user did not explicitly configure one. Different model architectures emit reasoning content in different formats (e.g., Kimi K2.5 wraps reasoning in <think>...</think>), so the appropriate default is architecture-specific.

If None, no reasoning parser is enabled by default and the user must opt in by setting runtime.reasoning_parser explicitly.

required_arguments​

required_arguments: dict[str, bool | int | float]

source

A dictionary specifying required values for PipelineConfig options.

requires_max_batch_context_length​

requires_max_batch_context_length: bool = False

source

Whether the architecture requires a max batch context length to be specified.

If True and max_batch_context_length is not specified, we will default to the max sequence length of the model.

rope_type​

rope_type: RopeType = 'none'

source

The type of RoPE (Rotary Position Embedding) used by the model.

supported_encodings​

supported_encodings: set[SupportedEncoding]

source

A dictionary of supported quantization encodings.

supports_empty_batches​

supports_empty_batches: bool = False

source

Whether the architecture can handle empty batches during inference.

When set to True, the pipeline can process requests with zero-sized batches without errors. This is useful for certain execution modes and expert parallelism. Most architectures do not require empty batch support and should leave this as False.

task​

task: PipelineTask

source

The pipeline task type that this architecture supports.

tokenizer​

tokenizer: Callable[..., PipelineTokenizer[Any, Any, Any]]

source

A callable that returns a PipelineTokenizer instance for preprocessing model inputs.

tokenizer_cls​

property tokenizer_cls: type[PipelineTokenizer[Any, Any, Any]]

source

Returns the tokenizer class for this architecture.

tool_parser​

tool_parser: str | Callable[[HuggingFaceRepo], str] | None = None

source

Optional default tool parser for this architecture.

Either a registered parser name (str), or a callable that takes the model’s HuggingFaceRepo handle (carrying repo_id, revision, subfolder, and trust_remote_code) and returns a registered parser name. Use the callable form when one architecture name covers multiple checkpoint revisions with different tool-call grammars (for example, DeepSeek V3 vs V3.1). The callable is invoked once during pipeline config resolution and the resulting string is stored on runtime.tool_parser.

The returned name must correspond to a parser registered via max.pipelines.lib.tool_parsing.register(). When set, the pipeline config falls back to this value for runtime.tool_parser if the user did not explicitly configure one.

If None, no tool parser is enabled by default and the serving layer falls back to its baseline parser.

weight_adapters​

weight_adapters: dict[WeightsFormat, WeightsAdapter]

source

A dictionary of weight format adapters for converting checkpoints from different formats to the default format.