Supported models
The table below lists all the model architectures currently supported by MAX.
Each model architecture represents a family of different models, as defined by
Hugging Face Transformers. The example model names are Hugging Face repository
IDs, such as google/gemma-3-27b-it for the Gemma3ForCausalLM architecture,
but you can use any model from Hugging Face that's based on an architecture
below.
To deploy any of these models with MAX, pass the model name to the max serve or docker run command. Try it now
by following the MAX quickstart guide. Or if you want to
serve a custom model, see the tutorial to serve custom model
architectures.
You can also see the model source code in GitHub.
| Architecture | Example models (repo IDs) | Modality | Encodings | Multi-GPU |
|---|---|---|---|---|
BertModel | sentence-transformers/all-MiniLM-L6-v2, | text-to-embeddings | bfloat16, float32 | No |
DeepseekV2ForCausalLM | deepseek-ai/DeepSeek-V2-Lite-Chat | text-to-text | bfloat16 | Yes |
DeepseekV32ForCausalLM | text-to-text | float8_e4m3fn | Yes | |
DeepseekV3ForCausalLM | deepseek-ai/DeepSeek-V3 | text-to-text | bfloat16, float4_e2m1fnx2, float8_e4m3fn | Yes |
ExaoneForCausalLM | LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct, | text-to-text | bfloat16, float32, q6_k | No |
Flux2KleinPipeline | black-forest-labs/FLUX.2-klein-4B, | text-to-image | bfloat16 | No |
Flux2Pipeline | black-forest-labs/FLUX.2-dev | text-to-image | bfloat16 | No |
FluxPipeline | black-forest-labs/FLUX.1-dev, | text-to-image | bfloat16 | No |
Gemma3ForCausalLM | text-to-text | bfloat16 | Yes | |
Gemma3ForConditionalGeneration | google/gemma-3-4b-it, | image-to-text, text-to-text | bfloat16, float8_e4m3fn | Yes |
GptOssForCausalLM | openai/gpt-oss-20b, | text-to-text | bfloat16, float4_e2m1fnx2 | Yes |
GraniteForCausalLM | ibm-granite/granite-3.1-8b-instruct, | text-to-text | bfloat16, float32 | No |
Idefics3ForConditionalGeneration | HuggingFaceM4/Idefics3-8B-Llama3 | image-to-text, text-to-text | bfloat16 | No |
InternVLChatModel | OpenGVLab/InternVL3-8B-Instruct | image-to-text, text-to-text | bfloat16 | Yes |
KimiK25ForConditionalGeneration | image-to-text, text-to-text | bfloat16, float4_e2m1fnx2, float8_e4m3fn | Yes | |
KimiVLForConditionalGeneration | moonshotai/Kimi-VL-A3B-Instruct | image-to-text, text-to-text | bfloat16, float4_e2m1fnx2, float8_e4m3fn | Yes |
LlamaForCausalLM | meta-llama/Llama-3.1-8B-Instruct, | text-to-text | bfloat16, float32, float4_e2m1fnx2, float8_e4m3fn, gptq, q6_k | Yes |
LlavaForConditionalGeneration | mistral-community/pixtral-12b | image-to-text, text-to-text | bfloat16 | No |
Mistral3ForConditionalGeneration | mistralai/Mistral-Small-3.1-24B-Instruct-2503 | text-to-text | bfloat16 | Yes |
MistralForCausalLM | mistralai/Mistral-Nemo-Instruct-2407 | text-to-text | bfloat16 | Yes |
MPNetForMaskedLM | sentence-transformers/all-mpnet-base-v2 | text-to-embeddings | bfloat16, float32 | No |
Olmo2ForCausalLM | allenai/OLMo-2-0425-1B-Instruct, | text-to-text | bfloat16, float32 | No |
Olmo3ForCausalLM | allenai/Olmo-3-7B-Instruct | text-to-text | bfloat16 | No |
OlmoForCausalLM | text-to-text | bfloat16, float32 | No | |
Phi3ForCausalLM | text-to-text | bfloat16, float32 | No | |
Qwen2_5_VLForConditionalGeneration | image-to-text, text-to-text | bfloat16, float32, float8_e4m3fn | Yes | |
Qwen2ForCausalLM | text-to-text | bfloat16, float32 | Yes | |
Qwen3ForCausalLM | Qwen/Qwen3-8B, | text-to-embeddings, text-to-text | bfloat16, float32, float8_e4m3fn | Yes |
Qwen3MoeForCausalLM | Qwen/Qwen3-30B-A3B-Instruct, | text-to-text | bfloat16, float32, float8_e4m3fn | Yes |
Qwen3VLForConditionalGeneration | image-to-text, text-to-text | bfloat16, float32, float8_e4m3fn | Yes | |
Qwen3VLMoeForConditionalGeneration | Qwen/Qwen3-VL-30B-A3B-Instruct | image-to-text, text-to-text | bfloat16, float32, float8_e4m3fn | Yes |
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!