For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Supported models

The table below lists all the model architectures currently supported by MAX.

Each model architecture represents a family of different models, as defined by Hugging Face Transformers. The example model names are Hugging Face repository IDs, such as google/gemma-3-27b-it for the Gemma3ForCausalLM architecture, but you can use any model from Hugging Face that's based on an architecture below.

To deploy any of these models with MAX, pass the model name to the max serve or docker run command. Try it now by following the MAX quickstart guide. Or if you want to serve a custom model, see the tutorial to serve custom model architectures.

You can also see the model source code in GitHub.

Architecture	Example models (repo IDs)	Modality	Encodings	Multi-GPU
`BertModel`	sentence-transformers/all-MiniLM-L6-v2, sentence-transformers/all-MiniLM-L12-v2	text-to-embeddings	bfloat16, float32	No
`DeepseekV2ForCausalLM`	deepseek-ai/DeepSeek-V2-Lite-Chat	text-to-text	bfloat16	Yes
`DeepseekV32ForCausalLM`	deepseek-ai/DeepSeek-V3.2, deepseek-ai/DeepSeek-V3.2-Exp	text-to-text	float8_e4m3fn	Yes
`DeepseekV3ForCausalLM`	deepseek-ai/DeepSeek-V3	text-to-text	bfloat16, float4_e2m1fnx2, float8_e4m3fn	Yes
`DFlashDraftModel`	z-lab/LLaMA3.1-8B-Instruct-DFlash-UltraChat	text-to-text	bfloat16, float32	Yes
`DiffusionGemmaForBlockDiffusion`	nvidia/diffusiongemma-26B-A4B-it-NVFP4, google/diffusiongemma-26B-A4B-it	text-to-text	bfloat16, float4_e2m1fnx2	No
`ExaoneForCausalLM`	LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct, LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct, LGAI-EXAONE/EXAONE-3.5-32B-Instruct	text-to-text	bfloat16, float32, q6_k	No
`Flux2KleinPipeline`	black-forest-labs/FLUX.2-klein-4B, black-forest-labs/FLUX.2-klein-9B, black-forest-labs/FLUX.2-klein-base-4B, black-forest-labs/FLUX.2-klein-base-9B, black-forest-labs/FLUX.2-klein-4b-nvfp4, black-forest-labs/FLUX.2-klein-9b-nvfp4	image-to-image, text-to-image	bfloat16, float4_e2m1fnx2	No
`Flux2Pipeline`	black-forest-labs/FLUX.2-dev, black-forest-labs/FLUX.2-dev-NVFP4	image-to-image, text-to-image	bfloat16, float4_e2m1fnx2	No
`Gemma3ForCausalLM`	google/gemma-3-1b-it, google/gemma-3-1b-pt	text-to-text	bfloat16	Yes
`Gemma3ForConditionalGeneration`	google/gemma-3-4b-it, google/gemma-3-4b-pt, google/gemma-3-12b-it, google/gemma-3-12b-pt, google/gemma-3-27b-it, google/gemma-3-27b-pt	image-to-text, text-to-text	bfloat16, float8_e4m3fn	Yes
`Gemma4AssistantForCausalLM`	google/gemma-4-31B-it-assistant	text-to-text	bfloat16	Yes
`Gemma4ForConditionalGeneration`	google/gemma-4-31B-it, nvidia/Gemma-4-31B-IT-NVFP4	image-to-text, text-to-text, video-to-text	bfloat16, float16, float4_e2m1fnx2	Yes
`GlmMoeDsaForCausalLM`	zai-org/GLM-5.1, zai-org/GLM-5.1-FP8, zai-org/GLM-5	text-to-text	bfloat16, float4_e2m1fnx2, float8_e4m3fn	Yes
`GptOssForCausalLM`	openai/gpt-oss-20b, openai/gpt-oss-120b, unsloth/gpt-oss-20b-BF16	text-to-text	bfloat16, float4_e2m1fnx2	Yes
`GraniteForCausalLM`	ibm-granite/granite-3.1-8b-instruct, ibm-granite/granite-3.1-8b-base	text-to-text	bfloat16, float32	No
`HYV3ForCausalLM`	tencent/Hy3-preview	text-to-text	bfloat16	Yes
`Idefics3ForConditionalGeneration`	HuggingFaceM4/Idefics3-8B-Llama3	image-to-text, text-to-text	bfloat16	No
`Ideogram4Pipeline`	ideogram-ai/ideogram-4-fp8	text-to-image	bfloat16	No
`InternVLChatModel`	OpenGVLab/InternVL3-8B-Instruct	image-to-text, text-to-text	bfloat16	Yes
`KimiK25ForConditionalGeneration`	nvidia/Kimi-K2.5-NVFP4, nvidia/Kimi-K2.6-NVFP4	image-to-text, text-to-text	bfloat16, float4_e2m1fnx2, float8_e4m3fn	Yes
`KimiVLForConditionalGeneration`	moonshotai/Kimi-VL-A3B-Instruct	image-to-text, text-to-text	bfloat16, float4_e2m1fnx2, float8_e4m3fn	Yes
`Lfm2ForCausalLM`	LiquidAI/LFM2.5-350M, LiquidAI/LFM2.5-350M-Base	text-to-text	bfloat16, float32	No
`LlamaForCausalLM`	meta-llama/Llama-3.1-8B-Instruct, deepseek-ai/DeepSeek-R1-Distill-Llama-8B, meta-llama/Llama-Guard-3-8B, meta-llama/Llama-3.2-1B-Instruct, meta-llama/Llama-3.2-3B-Instruct, deepseek-ai/deepseek-coder-6.7b-instruct, modularai/Llama-3.1-8B-Instruct-GGUF	text-to-text	bfloat16, float32, float4_e2m1fnx2, float8_e4m3fn, gptq, q6_k	Yes
`LlavaForConditionalGeneration`	mistral-experimental/pixtral-12b	image-to-text, text-to-text	bfloat16	No
`MambaForCausalLM`	state-spaces/mamba-130m-hf	text-to-text	bfloat16, float32	No
`MiniMaxM2ForCausalLM`	MiniMaxAI/MiniMax-M2.7, MiniMaxAI/MiniMax-M2.5, lukealonso/MiniMax-M2.7-NVFP4, amd/MiniMax-M2.7-MXFP4	text-to-text	float4_e2m1fnx2, float8_e4m3fn	Yes
`Mistral3ForConditionalGeneration`	mistralai/Mistral-Small-3.1-24B-Instruct-2503	text-to-text	bfloat16	Yes
`MistralForCausalLM`	mistralai/Mistral-Nemo-Instruct-2407	text-to-text	bfloat16	Yes
`MPNetForMaskedLM`	sentence-transformers/all-mpnet-base-v2	text-to-embeddings	bfloat16, float32	No
`Olmo2ForCausalLM`	allenai/OLMo-2-0425-1B-Instruct, allenai/OLMo-2-1124-7B, allenai/OLMo-2-1124-13B-Instruct, allenai/OLMo-2-0325-32B-Instruct, allenai/OLMo-2-1124-7B-GGUF	text-to-text	bfloat16, float32	No
`Olmo3ForCausalLM`	allenai/Olmo-3-7B-Instruct	text-to-text	bfloat16	No
`OlmoForCausalLM`	allenai/OLMo-1B-hf, allenai/OLMo-1B-0724-hf	text-to-text	bfloat16, float32	No
`Phi3ForCausalLM`	microsoft/phi-4, microsoft/Phi-3.5-mini-instruct	text-to-text	bfloat16, float32	No
`Qwen2_5_VLForConditionalGeneration`	Qwen/Qwen2.5-VL-3B-Instruct, Qwen/Qwen2.5-VL-7B-Instruct	image-to-text, text-to-text	bfloat16, float32, float8_e4m3fn	Yes
`Qwen2ForCausalLM`	Qwen/Qwen2.5-7B-Instruct, Qwen/QwQ-32B	text-to-text	bfloat16, float32	Yes
`Qwen3_5ForConditionalGeneration`	Qwen/Qwen3.5-27B	text-to-text	bfloat16, float32	No
`Qwen3ForCausalLM`	Qwen/Qwen3-8B, Qwen/Qwen3-30B-A3B, Qwen/Qwen3-Embedding-0.6B, Qwen/Qwen3-Embedding-4B, Qwen/Qwen3-Embedding-8B	text-to-embeddings, text-to-text	bfloat16, float32, float8_e4m3fn	Yes
`Qwen3MoeForCausalLM`	Qwen/Qwen3-30B-A3B-Instruct, Qwen/Qwen3-30B-A3B-Instruct-2507-FP8	text-to-text	bfloat16, float32, float8_e4m3fn	Yes
`Qwen3VLForConditionalGeneration`	Qwen/Qwen3-VL-4B-Instruct, Qwen/Qwen3-VL-2B-Instruct	image-to-text, text-to-text	bfloat16, float32, float8_e4m3fn	Yes
`Qwen3VLMoeForConditionalGeneration`	Qwen/Qwen3-VL-30B-A3B-Instruct	image-to-text, text-to-text	bfloat16, float32, float8_e4m3fn	Yes
`QwenImageEditPipeline`	Qwen/Qwen-Image-Edit-2511	image-to-image, text-to-image	bfloat16	No
`QwenImageEditPlusPipeline`	Qwen/Qwen-Image-Edit-2511	image-to-image, text-to-image	bfloat16	No
`QwenImagePipeline`	Qwen/Qwen-Image-2512	text-to-image	bfloat16	No
`Step3p5ForCausalLM`	stepfun-ai/Step-3.5-Flash	text-to-text	bfloat16	Yes
`UnifiedDflashKimiK25ForCausalLM`	nvidia/Kimi-K2.5-NVFP4	image-to-text, text-to-text	bfloat16, float4_e2m1fnx2, float8_e4m3fn	Yes
`UnifiedDflashLlama3ForCausalLM`	meta-llama/Llama-3.2-3B-Instruct	text-to-text	bfloat16, float32	No
`UnifiedMTPDeepseekV3ForCausalLM`	deepseek-ai/DeepSeek-V3	text-to-text	bfloat16, float4_e2m1fnx2, float8_e4m3fn	Yes
`UnifiedMTPGemma4ForCausalLM`	nvidia/Gemma-4-31B-IT-NVFP4, google/gemma-4-31B-it	text-to-text	bfloat16, float4_e2m1fnx2	Yes
`WanImageToVideoPipeline`	Wan-AI/Wan2.2-I2V-A14B-Diffusers, Wan-AI/Wan2.1-I2V-14B-720P-Diffusers	image-to-video	bfloat16, float32, float8_e4m3fn	No
`WanPipeline`	Wan-AI/Wan2.2-T2V-A14B-Diffusers, Wan-AI/Wan2.1-T2V-14B-Diffusers, Wan-AI/Wan2.2-TI2V-5B-Diffusers, yetter-ai/Wan2.2-TI2V-5B-Turbo-Diffusers	text-to-video	bfloat16, float32, float8_e4m3fn	No