The Modular Platform accelerates AI inference and abstracts hardware complexity. Using our Docker container, you can deploy a GenAI model from Hugging Face with an OpenAI-compatible endpoint on a wide range of hardware.
And if you need to customize the model or tune a GPU kernel, Modular provides a depth of model extensibility and GPU programmability that you won't find anywhere else.
Cloud
Shared or dedicated endpoints hosted Modular's cloud or your VPC.
Serving
High-performance, hardware-agnostic serving framework.
Modeling
1000+ models like DeepSeek and Kimi out of the box.
GPU Kernels
Extend or write custom GPU kernels that run on NVIDIA, AMD, and Apple GPUs.
Learning tools
500+ models
Modular offers fully-managed deployments for the latest open source models in our Model Library, or you can create a self-hosted endpoint with any model that's compatible with our supported model architectures.
View all modelsLatest blog posts
Go to blog