> For the complete documentation index, see [llms.txt](https://docs.modular.com/llms.txt).
> Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

# Embeddings

Text embeddings are rich numerical representations of text. They capture
semantic meaning in a way that allows computers to compare, cluster, and search
text effectively.

Use embeddings whenever you need to measure similarity between pieces of text,
perform semantic search, build recommendation systems, or cluster documents.
They are foundational for many modern NLP tasks.

In contemporary GenAI applications, embeddings are especially powerful in
agentic workflows, including:

- **Retrieval-Augmented Generation (RAG):** Embeddings make it possible to store
  and search large collections of documents, grounding model responses in your
  own data instead of relying only on a model's training knowledge.
- **Context injection for agents:** Embeddings help agents decide which pieces
  of external knowledge (APIs, tools, or documents) are most relevant to the
  current query.
- **Personalization and recommendations:** By embedding both user data and
  content, systems can deliver more tailored results.
- **Clustering and analytics:** Embeddings allow grouping similar inputs for
  downstream tasks like summarization, deduplication, and insight extraction.

## Endpoint

MAX supports the
[`v1/embeddings`](https://docs.modular.com/max/rest-api.md#POST/v1/embeddings) endpoint, which is
fully compatible with the OpenAI API.

To use the endpoint, provide the ID of an embedding model along with the text
to embed. The API returns numerical embeddings that capture the semantic
meaning of each input. The request payload should look similar to the following:

```json
{
  "model": "sentence-transformers/all-mpnet-base-v2",
  "input": "The food was delicious and the service was excellent."
}
```

## Quickstart

Serve and interact with an embedding model using an OpenAI-compatible endpoint.
Specifically, we'll use MAX to serve the
[all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
model, which is a powerful transformer that excels at capturing semantic
relationships in text.

System requirements:

[Read the requirements](https://docs.modular.com/max/packages.md#system-requirements)

### Set up your environment

Create a Python project to install our APIs and CLI tools:

### Serve your model

Use the [`max serve`](https://docs.modular.com/max/cli/serve.md) command to start a local model server
with the `all-mpnet-base-v2` model:

```sh
max serve \
  --model sentence-transformers/all-mpnet-base-v2
```

This will create a server running the `all-mpnet-base-v2` embedding model on
`http://localhost:8000/v1/embeddings`, an [OpenAI compatible
endpoint](https://platform.openai.com/docs/api-reference/embeddings).

The endpoint is ready when you see this message printed in your terminal:

```output
Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)
```

For a complete list of `max` CLI commands and options, refer to the
[MAX CLI reference](https://docs.modular.com/max/cli.md).

### Interact with your model

MAX supports OpenAI's REST APIs and you can interact with
the model using either the OpenAI Python SDK or curl:

**Python:**

You can use OpenAI's Python client to interact with the model.
First, install the OpenAI API:

Then, create a client and make a request to the model:

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

# Create embeddings
response = client.embeddings.create(
    model="sentence-transformers/all-mpnet-base-v2",
    input="Run an embedding model with MAX Serve!",
)

print(f"{response.data[0].embedding[:5]}")
```

You should receive a response similar to this:

```json
{"data":[{"index":0,"embedding":[-0.06595132499933243,0.005941616836935282,0.021467769518494606,0.23037832975387573,
```

The text has been shortened for brevity. This returns a numerical representation
of the input text that can be used for semantic comparisons.

---

**curl:**

The following `curl` command sends an embeddings request to the model:

```sh
curl http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
    "input": "Run an embedding model with MAX Serve!",
    "model": "sentence-transformers/all-mpnet-base-v2"
}'
```

You should receive a response similar to this:

```json
{"data":[{"index":0,"embedding":[-0.06595132499933243,0.005941616836935282,0.021467769518494606,0.23037832975387573,
```

The text has been shortened for brevity. This returns a numerical representation
of the input text that can be used for semantic comparisons.

For complete details on all available API endpoints and options, see the
[REST API documentation](https://docs.modular.com/max/rest-api.md).

## Next steps

Now that you have successfully set up MAX with an OpenAI-compatible embeddings
endpoint, checkout out these other tutorials:

