Run a text embedding model

embeddings

max

openai

Text embeddings are rich numerical representations of text that power many modern natural language processing (NLP) applications. This tutorial shows you how to serve and interact with an embedding model using an OpenAI-compatible endpoint. Specifically, we'll use MAX to serve the all-mpnet-base-v2 model, which is a powerful transformer that excels at capturing semantic relationships in text.

System requirements:

Mac

Linux

WSL

Set up your environment

Create a Python project to install our APIs and CLI tools:

pixi
uv
pip
conda

If you don't have it, install pixi:

curl -fsSL https://pixi.sh/install.sh | sh
curl -fsSL https://pixi.sh/install.sh | sh

Then restart your terminal for the changes to take effect.

Create a project:

pixi init embeddings-tutorial \
  -c https://conda.modular.com/max-nightly/ -c conda-forge \
  && cd embeddings-tutorial
pixi init embeddings-tutorial \
  -c https://conda.modular.com/max-nightly/ -c conda-forge \
  && cd embeddings-tutorial

Install the modular conda package:

Nightly
Stable

pixi add modular
pixi add modular

pixi add "modular=25.4"
pixi add "modular=25.4"

Start the virtual environment:
```
pixi shell
```
```
pixi shell
```

If you don't have it, install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh
curl -LsSf https://astral.sh/uv/install.sh | sh

Then restart your terminal to make uv accessible.

Create a project:

uv init embeddings-tutorial && cd embeddings-tutorial
uv init embeddings-tutorial && cd embeddings-tutorial

Create and start a virtual environment:

uv venv && source .venv/bin/activate
uv venv && source .venv/bin/activate

Install the modular Python package:

Nightly
Stable

uv pip install modular \
  --extra-index-url https://download.pytorch.org/whl/cpu \
  --index-url https://dl.modular.com/public/nightly/python/simple/ \
  --index-strategy unsafe-best-match --prerelease allow
uv pip install modular \
  --extra-index-url https://download.pytorch.org/whl/cpu \
  --index-url https://dl.modular.com/public/nightly/python/simple/ \
  --index-strategy unsafe-best-match --prerelease allow

uv pip install modular \
  --extra-index-url https://download.pytorch.org/whl/cpu \
  --extra-index-url https://modular.gateway.scarf.sh/simple/ \
  --index-strategy unsafe-best-match
uv pip install modular \
  --extra-index-url https://download.pytorch.org/whl/cpu \
  --extra-index-url https://modular.gateway.scarf.sh/simple/ \
  --index-strategy unsafe-best-match

Create a project folder:

mkdir embeddings-tutorial && cd embeddings-tutorial
mkdir embeddings-tutorial && cd embeddings-tutorial

Create and activate a virtual environment:

python3 -m venv .venv/embeddings-tutorial \
  && source .venv/embeddings-tutorial/bin/activate
python3 -m venv .venv/embeddings-tutorial \
  && source .venv/embeddings-tutorial/bin/activate

Install the modular Python package:

Nightly
Stable

pip install --pre modular \
  --extra-index-url https://download.pytorch.org/whl/cpu \
  --index-url https://dl.modular.com/public/nightly/python/simple/
pip install --pre modular \
  --extra-index-url https://download.pytorch.org/whl/cpu \
  --index-url https://dl.modular.com/public/nightly/python/simple/

pip install modular \
  --extra-index-url https://download.pytorch.org/whl/cpu \
  --extra-index-url https://modular.gateway.scarf.sh/simple/
pip install modular \
  --extra-index-url https://download.pytorch.org/whl/cpu \
  --extra-index-url https://modular.gateway.scarf.sh/simple/

If you don't have it, install conda. A common choice is with brew:
```
brew install miniconda
```
```
brew install miniconda
```
Initialize conda for shell interaction:
```
conda init
```
```
conda init
```
If you're on a Mac, instead use:
```
conda init zsh
```
```
conda init zsh
```
Then restart your terminal for the changes to take effect.

Create a project:

conda create -n embeddings-tutorial
conda create -n embeddings-tutorial

Start the virtual environment:

conda activate embeddings-tutorial

conda activate embeddings-tutorial

Install the modular conda package:

Nightly
Stable

conda install -c conda-forge -c https://conda.modular.com/max-nightly/ modular
conda install -c conda-forge -c https://conda.modular.com/max-nightly/ modular

conda install -c conda-forge -c https://conda.modular.com/max/ modular
conda install -c conda-forge -c https://conda.modular.com/max/ modular

Serve your model

Use the max serve command to start a local model server with the all-mpnet-base-v2 model:

max serve \
  --model-path sentence-transformers/all-mpnet-base-v2
max serve \
  --model-path sentence-transformers/all-mpnet-base-v2

This will create a server running the all-mpnet-base-v2 embedding model on http://localhost:8000/v1/embeddings, an OpenAI compatible endpoint.

The endpoint is ready when you see this message printed in your terminal:

Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)

Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)

For a complete list of max CLI commands and options, refer to the MAX CLI reference.

Interact with your model

MAX supports OpenAI's REST APIs and you can interact with the model using either the OpenAI Python SDK or curl:

Python
curl

You can use OpenAI's Python client to interact with the model. First, install the OpenAI API:

pixi
uv
pip
conda

pixi add openai
pixi add openai

uv add openai
uv add openai

pip install openai
pip install openai

conda install openai
conda install openai

Then, create a client and make a request to the model:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

# Create embeddings
response = client.embeddings.create(
    model="sentence-transformers/all-mpnet-base-v2",
    input="Run an embedding model with MAX Serve!",
)

print(f"{response.data[0].embedding[:5]}")
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

# Create embeddings
response = client.embeddings.create(
    model="sentence-transformers/all-mpnet-base-v2",
    input="Run an embedding model with MAX Serve!",
)

print(f"{response.data[0].embedding[:5]}")

You should receive a response similar to this:

{"data":[{"index":0,"embedding":[-0.06595132499933243,0.005941616836935282,0.021467769518494606,0.23037832975387573,
{"data":[{"index":0,"embedding":[-0.06595132499933243,0.005941616836935282,0.021467769518494606,0.23037832975387573,

The text has been shortened for brevity. This returns a numerical representation of the input text that can be used for semantic comparisons.

The following curl command sends an embeddings request to the model's chat completions

curl http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
    "input": "Run an embedding model with MAX Serve!",
    "model": "sentence-transformers/all-mpnet-base-v2"
}'
curl http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
    "input": "Run an embedding model with MAX Serve!",
    "model": "sentence-transformers/all-mpnet-base-v2"
}'

You should receive a response similar to this:

{"data":[{"index":0,"embedding":[-0.06595132499933243,0.005941616836935282,0.021467769518494606,0.23037832975387573,
{"data":[{"index":0,"embedding":[-0.06595132499933243,0.005941616836935282,0.021467769518494606,0.23037832975387573,

The text has been shortened for brevity. This returns a numerical representation of the input text that can be used for semantic comparisons.

For complete details on all available API endpoints and options, see the MAX Serve API documentation.

Next steps

Now that you have successfully set up MAX with an OpenAI-compatible embeddings endpoint, checkout out these other tutorials:

Set up your environment​

Serve your model​

Interact with your model​

Next steps​

Set up your environment

Serve your model

Interact with your model

Next steps