For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Python class
SupportedArchitecture
SupportedArchitectureβ
class max.pipelines.lib.registry.SupportedArchitecture(name, example_repo_ids, default_encoding, supported_encodings, pipeline_model, task, tokenizer, default_weights_format, context_type, config, rope_type='none', weight_adapters=<factory>, multi_gpu_supported=False, input_modalities=<factory>, required_arguments=<factory>, context_validators=<factory>, supports_empty_batches=False, requires_max_batch_context_length=False, tool_parser=None, reasoning_parser=None, memory_planner=None)
Bases: object
Represents a model architecture configuration for MAX pipelines.
Defines the components and settings required to support a specific model architecture within the MAX pipeline system. Each SupportedArchitecture instance encapsulates the model implementation, tokenizer, supported encodings, and other architecture-specific configuration.
New architectures should be registered into the PipelineRegistry
using the register() method.
Example:
my_architecture = SupportedArchitecture(
name="MyModelForCausalLM", # Must match your Hugging Face model class name
example_repo_ids=[
"your-org/your-model-name", # Add example model repository IDs
],
default_encoding="q4_k",
supported_encodings={
"q4_k",
"bfloat16",
# Add other encodings your model supports
},
pipeline_model=MyModel,
tokenizer=TextTokenizer,
context_type=TextContext,
config=MyModelConfig, # Architecture-specific config class
default_weights_format=WeightsFormat.safetensors,
rope_type="none",
weight_adapters={
WeightsFormat.safetensors: weight_adapters.convert_safetensor_state_dict,
# Add other weight formats if needed
},
multi_gpu_supported=True, # Set based on your implementation capabilities
required_arguments={"some_arg": True},
task=PipelineTask.TEXT_GENERATION,
)-
Parameters:
-
- name (str)
- example_repo_ids (list[str])
- default_encoding (SupportedEncoding)
- supported_encodings (set[SupportedEncoding])
- pipeline_model (PipelineModelType)
- task (PipelineTask)
- tokenizer (Callable[..., PipelineTokenizer[Any, Any, Any]])
- default_weights_format (WeightsFormat)
- context_type (type[TextContext] | type[EmbeddingsContext])
- config (type[ArchConfig])
- rope_type (RopeType)
- weight_adapters (dict[WeightsFormat, WeightsAdapter])
- multi_gpu_supported (bool)
- input_modalities (set[InputModality])
- required_arguments (dict[str, bool | int | float])
- context_validators (list[Callable[[TextContext | TextAndVisionContext | PixelContext], None]])
- supports_empty_batches (bool)
- requires_max_batch_context_length (bool)
- tool_parser (str | Callable[[HuggingFaceRepo], str] | None)
- reasoning_parser (str | None)
- memory_planner (type[MemoryPlanner] | None)
configβ
config: type[ArchConfig]
The architecture-specific configuration class for the model.
This class must implement the ArchConfig protocol, providing an
initialize method that creates a configuration instance from a
PipelineConfig. For models with KV cache, this should be a class
implementing ArchConfigWithKVCache to enable KV cache memory estimation.
context_typeβ
context_type: type[TextContext] | type[EmbeddingsContext]
The context class type that this architecture uses for managing request state and inputs.
This should be a class (not an instance) that implements either the TextContext or EmbeddingsContext protocol, defining how the pipeline processes and tracks requests.
context_validatorsβ
context_validators: list[Callable[[TextContext | TextAndVisionContext | PixelContext], None]]
A list of callable validators that verify context inputs before model execution.
These validators are called during context creation to ensure inputs meet model-specific requirements. Validators should raise InputError for invalid inputs, providing early error detection before expensive model operations.
def validate_single_image(context: TextContext | TextAndVisionContext) -> None:
if isinstance(context, TextAndVisionContext):
if context.pixel_values and len(context.pixel_values) > 1:
raise InputError(f"Model supports only 1 image, got {len(context.pixel_values)}")
my_architecture = SupportedArchitecture(
# ... other fields ...
context_validators=[validate_single_image],
)default_encodingβ
default_encoding: SupportedEncoding
The default quantization encoding to use when no specific encoding is requested.
default_weights_formatβ
default_weights_format: WeightsFormat
The weights format expected by the pipeline_model.
example_repo_idsβ
A list of Hugging Face repository IDs that use this architecture for testing and validation purposes.
input_modalitiesβ
input_modalities: set[InputModality]
The set of input modalities this architecture accepts.
Defaults to text-only. Multimodal architectures should declare all
supported input types explicitly, e.g.
{InputModality.TEXT, InputModality.IMAGE} for vision-language models.
memory_plannerβ
memory_planner: type[MemoryPlanner] | None = None
Optional MemoryPlanner subclass for
this architecture.
When set, PipelineConfig uses the planner to estimate weight size,
activation memory, signal-buffer memory, and vision cache entry bytes.
Autoregressive text-generation models should set this to
PagedMemoryPlanner (or a subclass with
architecture-specific overrides).
None means the architecture manages its own memory estimation (e.g.
diffusion pipelines that skip KV cache estimation entirely).
multi_gpu_supportedβ
multi_gpu_supported: bool = False
Whether the architecture supports multi-GPU execution.
nameβ
name: str
The name of the model architecture that must match the Hugging Face model class name.
pipeline_modelβ
pipeline_model: PipelineModelType
The model class that defines the graph structure and execution logic.
Accepts either a PipelineModel subclass (for LLM and other
token-generation architectures) or a PipelineExecutor subclass
(for newer executor-based architectures such as diffusion pipelines).
reasoning_parserβ
Optional default reasoning parser name for this architecture.
The name must correspond to a parser registered via
max.pipelines.lib.reasoning.register(). When set, the pipeline
config will fall back to this value for runtime.reasoning_parser if
the user did not explicitly configure one. Different model architectures
emit reasoning content in different formats (e.g., Kimi K2.5 wraps
reasoning in <think>...</think>), so the appropriate default is
architecture-specific.
If None, no reasoning parser is enabled by default and the user must
opt in by setting runtime.reasoning_parser explicitly.
required_argumentsβ
A dictionary specifying required values for PipelineConfig options.
requires_max_batch_context_lengthβ
requires_max_batch_context_length: bool = False
Whether the architecture requires a max batch context length to be specified.
If True and max_batch_context_length is not specified, we will default to the max sequence length of the model.
rope_typeβ
rope_type: RopeType = 'none'
The type of RoPE (Rotary Position Embedding) used by the model.
supported_encodingsβ
supported_encodings: set[SupportedEncoding]
A dictionary of supported quantization encodings.
supports_empty_batchesβ
supports_empty_batches: bool = False
Whether the architecture can handle empty batches during inference.
When set to True, the pipeline can process requests with zero-sized batches without errors. This is useful for certain execution modes and expert parallelism. Most architectures do not require empty batch support and should leave this as False.
taskβ
task: PipelineTask
The pipeline task type that this architecture supports.
tokenizerβ
tokenizer: Callable[..., PipelineTokenizer[Any, Any, Any]]
A callable that returns a PipelineTokenizer instance for preprocessing model inputs.
tokenizer_clsβ
property tokenizer_cls: type[PipelineTokenizer[Any, Any, Any]]
Returns the tokenizer class for this architecture.
tool_parserβ
tool_parser: str | Callable[[HuggingFaceRepo], str] | None = None
Optional default tool parser for this architecture.
Either a registered parser name (str), or a callable that takes the
modelβs HuggingFaceRepo handle (carrying repo_id,
revision, subfolder, and trust_remote_code) and returns a
registered parser name. Use the callable form when one architecture
name covers multiple checkpoint revisions with different tool-call
grammars (for example, DeepSeek V3 vs V3.1). The callable is invoked
once during pipeline config resolution and the resulting string is
stored on runtime.tool_parser.
The returned name must correspond to a parser registered via
max.pipelines.lib.tool_parsing.register(). When set, the
pipeline config falls back to this value for runtime.tool_parser
if the user did not explicitly configure one.
If None, no tool parser is enabled by default and the serving layer falls back to its baseline parser.
weight_adaptersβ
weight_adapters: dict[WeightsFormat, WeightsAdapter]
A dictionary of weight format adapters for converting checkpoints from different formats to the default format.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!