Supported models

The table below lists all the model architectures currently supported by MAX.

Each model architecture represents a family of different models, as defined by Hugging Face Transformers. The example model names are Hugging Face repository IDs, such as google/gemma-3-27b-it for the Gemma3ForCausalLM architecture, but you can use any model from Hugging Face that's based on an architecture below.

To deploy any of these models with MAX, pass the model name to the max serve or docker run command. Try it now by following the MAX quickstart guide. Or if you want to serve a custom model, see the tutorial to serve custom model architectures.

You can also see the model source code in GitHub.

Architecture	Example models (repo IDs)	Modality	Encodings	Multi-GPU
`BertModel`	sentence-transformers/all-MiniLM-L6-v2, sentence-transformers/all-MiniLM-L12-v2	text-to-embeddings	bfloat16, float32	No
`DeepseekV2ForCausalLM`	deepseek-ai/DeepSeek-V2-Lite-Chat	text-to-text	bfloat16	Yes
`DeepseekV32ForCausalLM`	deepseek-ai/DeepSeek-V3.2, deepseek-ai/DeepSeek-V3.2-Exp	text-to-text	float8_e4m3fn	Yes
`DeepseekV3ForCausalLM`	deepseek-ai/DeepSeek-V3	text-to-text	bfloat16, float4_e2m1fnx2, float8_e4m3fn	Yes
`ExaoneForCausalLM`	LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct, LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct, LGAI-EXAONE/EXAONE-3.5-32B-Instruct	text-to-text	bfloat16, float32, q6_k	No
`Flux2KleinPipeline`	black-forest-labs/FLUX.2-klein-4B, black-forest-labs/FLUX.2-klein-9B, black-forest-labs/FLUX.2-klein-base-4B, black-forest-labs/FLUX.2-klein-base-9B	text-to-image	bfloat16	No
`Flux2Pipeline`	black-forest-labs/FLUX.2-dev	text-to-image	bfloat16	No
`FluxPipeline`	black-forest-labs/FLUX.1-dev, black-forest-labs/FLUX.1-schnell	text-to-image	bfloat16	No
`Gemma3ForCausalLM`	google/gemma-3-1b-it, google/gemma-3-1b-pt	text-to-text	bfloat16	Yes
`Gemma3ForConditionalGeneration`	google/gemma-3-4b-it, google/gemma-3-4b-pt, google/gemma-3-12b-it, google/gemma-3-12b-pt, google/gemma-3-27b-it, google/gemma-3-27b-pt	image-to-text, text-to-text	bfloat16, float8_e4m3fn	Yes
`GptOssForCausalLM`	openai/gpt-oss-20b, openai/gpt-oss-120b, unsloth/gpt-oss-20b-BF16	text-to-text	bfloat16, float4_e2m1fnx2	Yes
`GraniteForCausalLM`	ibm-granite/granite-3.1-8b-instruct, ibm-granite/granite-3.1-8b-base	text-to-text	bfloat16, float32	No
`Idefics3ForConditionalGeneration`	HuggingFaceM4/Idefics3-8B-Llama3	image-to-text, text-to-text	bfloat16	No
`InternVLChatModel`	OpenGVLab/InternVL3-8B-Instruct	image-to-text, text-to-text	bfloat16	Yes
`KimiK25ForConditionalGeneration`	moonshotai/Kimi-K2.5, nvidia/Kimi-K2.5-NVFP4	image-to-text, text-to-text	bfloat16, float4_e2m1fnx2, float8_e4m3fn	Yes
`KimiVLForConditionalGeneration`	moonshotai/Kimi-VL-A3B-Instruct	image-to-text, text-to-text	bfloat16, float4_e2m1fnx2, float8_e4m3fn	Yes
`LlamaForCausalLM`	meta-llama/Llama-3.1-8B-Instruct, deepseek-ai/DeepSeek-R1-Distill-Llama-8B, meta-llama/Llama-Guard-3-8B, meta-llama/Llama-3.2-1B-Instruct, meta-llama/Llama-3.2-3B-Instruct, deepseek-ai/deepseek-coder-6.7b-instruct, modularai/Llama-3.1-8B-Instruct-GGUF	text-to-text	bfloat16, float32, float4_e2m1fnx2, float8_e4m3fn, gptq, q6_k	Yes
`LlavaForConditionalGeneration`	mistral-community/pixtral-12b	image-to-text, text-to-text	bfloat16	No
`Mistral3ForConditionalGeneration`	mistralai/Mistral-Small-3.1-24B-Instruct-2503	text-to-text	bfloat16	Yes
`MistralForCausalLM`	mistralai/Mistral-Nemo-Instruct-2407	text-to-text	bfloat16	Yes
`MPNetForMaskedLM`	sentence-transformers/all-mpnet-base-v2	text-to-embeddings	bfloat16, float32	No
`Olmo2ForCausalLM`	allenai/OLMo-2-0425-1B-Instruct, allenai/OLMo-2-1124-7B, allenai/OLMo-2-1124-13B-Instruct, allenai/OLMo-2-0325-32B-Instruct, allenai/OLMo-2-1124-7B-GGUF	text-to-text	bfloat16, float32	No
`Olmo3ForCausalLM`	allenai/Olmo-3-7B-Instruct	text-to-text	bfloat16	No
`OlmoForCausalLM`	allenai/OLMo-1B-hf, allenai/OLMo-1B-0724-hf	text-to-text	bfloat16, float32	No
`Phi3ForCausalLM`	microsoft/phi-4, microsoft/Phi-3.5-mini-instruct	text-to-text	bfloat16, float32	No
`Qwen2_5_VLForConditionalGeneration`	Qwen/Qwen2.5-VL-3B-Instruct, Qwen/Qwen2.5-VL-7B-Instruct	image-to-text, text-to-text	bfloat16, float32, float8_e4m3fn	Yes
`Qwen2ForCausalLM`	Qwen/Qwen2.5-7B-Instruct, Qwen/QwQ-32B	text-to-text	bfloat16, float32	Yes
`Qwen3ForCausalLM`	Qwen/Qwen3-8B, Qwen/Qwen3-30B-A3B, Qwen/Qwen3-Embedding-0.6B, Qwen/Qwen3-Embedding-4B, Qwen/Qwen3-Embedding-8B	text-to-embeddings, text-to-text	bfloat16, float32, float8_e4m3fn	Yes
`Qwen3MoeForCausalLM`	Qwen/Qwen3-30B-A3B-Instruct, Qwen/Qwen3-30B-A3B-Instruct-2507-FP8	text-to-text	bfloat16, float32, float8_e4m3fn	Yes
`Qwen3VLForConditionalGeneration`	Qwen/Qwen3-VL-4B-Instruct, Qwen/Qwen3-VL-2B-Instruct	image-to-text, text-to-text	bfloat16, float32, float8_e4m3fn	Yes
`Qwen3VLMoeForConditionalGeneration`	Qwen/Qwen3-VL-30B-A3B-Instruct	image-to-text, text-to-text	bfloat16, float32, float8_e4m3fn	Yes