Skip to main content
Log in

Install MAX with pip

You can install everything you need to build and deploy MAX models using pip. However, if you want to develop with Mojo, we recommend using Magic or conda.

Get started using pip

Here's how to install the Modular platform APIs and tools with pip, and then deploy a GenAI model on a local endpoint:

  1. Start a Python virtual environment and install MAX:

    1. Create a project folder:

      mkdir modular && cd modular
      mkdir modular && cd modular
    2. Create and activate a virtual environment:

      python3 -m venv .venv/modular \
      && source .venv/modular/bin/activate
      python3 -m venv .venv/modular \
      && source .venv/modular/bin/activate
    3. Install the modular Python package:

      pip install modular \
      --index-url https://download.pytorch.org/whl/cpu \
      --extra-index-url https://dl.modular.com/public/nightly/python/simple/
      pip install modular \
      --index-url https://download.pytorch.org/whl/cpu \
      --extra-index-url https://dl.modular.com/public/nightly/python/simple/
  2. Start a local endpoint for Llama 3:

    max serve --model-path=modularai/Llama-3.1-8B-Instruct-GGUF
    max serve --model-path=modularai/Llama-3.1-8B-Instruct-GGUF

    In addition to starting a local server, this downloads the model weights and compiles the model, which might take some time.

    The endpoint is ready when you see the URI printed in your terminal:

    Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)
    Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)
  3. Now open another terminal to send a request using curl:

    curl -N http://0.0.0.0:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "modularai/Llama-3.1-8B-Instruct-GGUF",
    "stream": true,
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the World Series in 2020?"}
    ]
    }' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '\n' | sed 's/\\n/\n/g'
    curl -N http://0.0.0.0:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "modularai/Llama-3.1-8B-Instruct-GGUF",
    "stream": true,
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the World Series in 2020?"}
    ]
    }' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '\n' | sed 's/\\n/\n/g'

Now check out these tutorials for more about how to accelerate your GenAI models with MAX:

What's included

The modular Python package installs the following:

Known issues

  • The Mojo LSP and Mojo debugger aren't included. If you want to develop with Mojo, we currently recommend you install the max conda package with Magic or conda.