Skip to main content

Modular Documentation

The Modular Platform accelerates AI inference and abstracts hardware complexity. Using our Docker container, you can deploy a GenAI model from Hugging Face with an OpenAI-compatible endpoint on a wide range of hardware.

And if you need to customize the model or tune a GPU kernel, Modular provides a depth of model extensibility and GPU programmability that you won't find anywhere else.

Cloud

Shared or dedicated endpoints hosted Modular's cloud or your VPC.

Serving

High-performance, hardware-agnostic serving framework.

Modeling

1000+ models like DeepSeek and Kimi out of the box.

GPU Kernels

Extend or write custom GPU kernels that run on NVIDIA, AMD, and Apple GPUs.

Latest blog posts

Go to blog