Modular Documentation
The Modular Platform accelerates AI inference and abstracts hardware complexity. Using our Docker container, you can deploy a GenAI model from Hugging Face with an OpenAI-compatible endpoint on a wide range of hardware.
And if you need to customize the model or tune a GPU kernel, Modular provides a depth of model extensibility and GPU programmability that you won't find anywhere else.
Get startedpython
from openai import OpenAI
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="EMPTY")
completion = client.chat.completions.create(
model="google/gemma-3-27b-it",
messages=[
{"role": "user", "content": "Who won the world series in 2020?"}
],
)
print(completion.choices[0].message.content)Learning tools
500+ models
Modular offers fully-managed deployments for the latest open source models in our Model Library, or you can create a self-hosed endpoint with any model that's compatible with our supported model architectures.
View all modelsLatest blog posts
Go to blog





