Skip to main content
Log in

Get started with MAX

With just a few commands, you can install MAX as a conda package and deploy a GenAI model on a local endpoint.

System requirements:

Start a GenAI endpoint

  1. Install our magic package manager:

    curl -ssL https://magic.modular.com/ | bash
    curl -ssL https://magic.modular.com/ | bash

    Then run the source command that's printed in your terminal.

  2. Use magic to install our max-pipelines CLI tool:

    magic global install max-pipelines
    magic global install max-pipelines

    This also installs MAX, Mojo, and other package dependencies.

  3. Start a local endpoint for Llama 3:

    max-pipelines serve --huggingface-repo-id=modularai/Llama-3.1-8B-Instruct-GGUF
    max-pipelines serve --huggingface-repo-id=modularai/Llama-3.1-8B-Instruct-GGUF

    In addition to starting a local server, this downloads the model weights and compiles the model, which might take some time.

    The endpoint is ready when you see the URI printed in your terminal:

    Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)
    Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)
  4. Now open another terminal to send a request using curl:

    curl -N http://0.0.0.0:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "modularai/Llama-3.1-8B-Instruct-GGUF",
    "stream": true,
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the World Series in 2020?"}
    ]
    }' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '\n' | sed 's/\\n/\n/g'
    curl -N http://0.0.0.0:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "modularai/Llama-3.1-8B-Instruct-GGUF",
    "stream": true,
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the World Series in 2020?"}
    ]
    }' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '\n' | sed 's/\\n/\n/g'

That's it. You just deployed Llama 3 on your local CPU. You can also deploy MAX to a cloud GPU.

Notice there was no step above to install MAX. That's because magic automatically installs all package dependencies when it starts the endpoint. Alternatively, you can deploy everything you need using our pre-configured MAX container.

Stay in touch

Get the latest updates

Stay up to date on MAX’s updates and key feature releases. We’re moving fast over here.

Talk to an AI Expert

Connect with our product experts to explore how we can help you deploy and serve AI models with high performance, scalability, and cost-efficiency.

Book a call

Try a tutorial

For a more detailed walkthrough of how to build and deploy with MAX, check out these tutorials.