Image generation

With MAX, you can run open-source image generation models locally and access them through a REST API. This page explains how to use the v1/responses endpoint to generate images from text prompts or transform existing images, with examples for each input type.

Endpoint

The MAX v1/responses endpoint provides a unified interface for diverse AI tasks including image generation, with structured input and output handling. It's built on Open Responses, an open-source initiative to create a standardized, provider-agnostic API specification that works across different AI providers and model backends.

Text input

For text-to-image generation, set input to a plain string describing the image you want. The model returns the generated image as base64-encoded data in output[0].content[0].image_data:

Python
curl

response = client.responses.create(
    model="black-forest-labs/FLUX.2-dev",
    input="Your text prompt here",
    extra_body={
        "provider_options": {
            "image": {"height": 1024, "width": 1024, "steps": 28}
        }
    }
)

image_data = response.output[0].content[0].image_data

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "black-forest-labs/FLUX.2-dev",
    "input": "Your text prompt here",
    "provider_options": {
      "image": {"height": 1024, "width": 1024, "steps": 28}
    }
  }'

Image URL input

For image-to-image workflows, set input to a structured message array containing the source image URL and a text prompt describing the transformation. The type field distinguishes image and text content within the same message:

Python
curl

response = client.responses.create(
    model="black-forest-labs/FLUX.2-dev",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_image",
                    "image_url": "https://example.com/input.png"
                },
                {
                    "type": "input_text",
                    "text": "Your transformation prompt"
                }
            ]
        }
    ],
    extra_body={
        "provider_options": {
            "image": {"height": 1024, "width": 1024, "steps": 28}
        }
    }
)

image_data = response.output[0].content[0].image_data

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "black-forest-labs/FLUX.2-dev",
    "input": [
      {
        "role": "user",
        "content": [
          {
            "type": "input_image",
            "image_url": "https://example.com/input.png"
          },
          {
            "type": "input_text",
            "text": "Your transformation prompt"
          }
        ]
      }
    ],
    "provider_options": {
      "image": {"height": 1024, "width": 1024, "steps": 28}
    }
  }'

Local file input

Local files must be base64-encoded and passed as a data URI in the image_url field using the format data:<mime-type>;base64,<data>.

Python
curl

import base64

with open("/path/to/image.png", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode("utf-8")

response = client.responses.create(
    model="black-forest-labs/FLUX.2-dev",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_image",
                    "image_url": f"data:image/png;base64,{image_base64}"
                },
                {
                    "type": "input_text",
                    "text": "Your transformation prompt"
                }
            ]
        }
    ],
    extra_body={
        "provider_options": {
            "image": {"height": 1024, "width": 1024, "steps": 28}
        }
    }
)

image_data = response.output[0].content[0].image_data

IMAGE_DATA=$(base64 -w 0 /path/to/image.png)

cat <<EOF > request.json
{
  "model": "black-forest-labs/FLUX.2-dev",
  "input": [{"role": "user", "content": [
    {"type": "input_image", "image_url": "data:image/png;base64,$IMAGE_DATA"},
    {"type": "input_text", "text": "Your transformation prompt"}
  ]}],
  "provider_options": {"image": {"height": 1024, "width": 1024, "steps": 28}}
}
EOF

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d @request.json

Provider options

The provider_options argument is an extension point in the Open Responses spec that lets each API provider expose parameters beyond the standard request fields. MAX uses it to surface image generation controls such as dimensions and denoising steps.

The following parameters are available under provider_options.image:

Parameter	Default	Description
`height` / `width`	1024	Output dimensions in pixels (must be multiples of 16)
`steps`	28	Number of denoising steps
`guidance_scale`	3.5	How closely the output follows the prompt
`negative_prompt`	`""`	Content to avoid in the output

Height and width: You can generate images at different aspect ratios. Image dimensions must be a multiple of 16 and are automatically scaled to a multiple of 16 if an incompatible integer is provided.

Steps: steps has the greatest effect on generation time. Diffusion models work by iteratively refining a noisy image. More steps produce higher-quality results but take proportionally longer. You can experiment with steps when considering the tradeoffs of speed and quality.

Prompt adherence: guidance_scale determines how literally the model interprets your prompt. Higher values (7-10) produce results that closely match the prompt. Lower values (1-3) allow more creative variation.

Negative prompts: Use negative_prompt to steer the model away from unwanted content, for example "blurry, low quality, distorted". It's best practice to include any negative prompting in the negative_prompt argument and not in the main input_text string.

If you encounter memory errors, try reducing your output image dimensions or the number of denoising steps:

"provider_options": {"image": {"height": 512, "width": 512, "steps": 25}}

Quickstart

In this quickstart, learn how to set up and run FLUX.2-dev for image generation.

GPU required

To run FLUX.2-dev, your system must have a compatible GPU with sufficient GPU RAM.

System requirements:

Mac

Linux

WSL

GPU

Set up your environment

Create a Python project to install our APIs and CLI tools:

pixi
uv

If you don't have it, install pixi:
```
curl -fsSL https://pixi.sh/install.sh | sh
```
Then restart your terminal for the changes to take effect.

Create a project:

pixi init image-generation-quickstart \
  -c https://conda.modular.com/max-nightly/ -c conda-forge \
  && cd image-generation-quickstart

Tip: You can skip the -c options if you add these channels as defaults.

Install modular (nightlyTo get the stable build, change the version in the website header.):
```
pixi add modular
```
Start the virtual environment:
```
pixi shell
```

If you don't have it, install uv:
```
curl -LsSf https://astral.sh/uv/install.sh | sh
```
Then restart your terminal to make uv accessible.

Create a project:

uv init image-generation-quickstart && cd image-generation-quickstart

Create and start a virtual environment:
```
uv venv && source .venv/bin/activate
```

Install modular (nightlyTo get the stable build, change the version in the website header.):

uv pip install modular \
    --index https://whl.modular.com/nightly/simple/ \
    --prerelease allow

Serve your model

First, enable the v1/responses endpoint by setting the MAX_SERVE_API_TYPES environment variable:

export MAX_SERVE_API_TYPES='["responses"]'

Agree to the FLUX license and make your Hugging Face access token available in your environment:

export HF_TOKEN="hf_..."

Then, use the max serve command to start a local model server with the FLUX.2-dev model:

max serve \
  --model black-forest-labs/FLUX.2-dev

The endpoint is ready when you see this message printed in your terminal:

Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)

For a complete list of max CLI commands and options, refer to the MAX CLI reference.

Generate an image from text

Generate an image from a text description by sending a request to the v1/responses endpoint. The input field is a text string describing the desired image, and provider_options controls generation parameters provided by Modular. You can send requests using either the OpenAI Python SDK or curl:

Python
curl

You can use OpenAI's Python client to interact with the image generation model. First, install the OpenAI SDK:

pixi
uv
pip
conda

pixi add openai

uv add openai

pip install openai

conda install openai

Then, create a client and make a request to the model:

generate-image.py
import base64
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

response = client.responses.create(
    model="black-forest-labs/FLUX.2-dev",
    input="A serene mountain landscape at sunset",
    extra_body={
        "provider_options": {
            "image": {"height": 512, "width": 512, "steps": 28}
        }
    }
)

image_data = response.output[0].content[0].image_data
with open("output-text-to-image.png", "wb") as f:
    f.write(base64.b64decode(image_data))

Run the script to generate the image:

python generate-image.py

The model saves the generated image to output-text-to-image.png in your current directory.

Send a request to the v1/responses endpoint and decode the base64-encoded image data from the response:

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "black-forest-labs/FLUX.2-dev",
    "input": "A serene mountain landscape at sunset",
    "provider_options": {
      "image": {"height": 512, "width": 512, "steps": 28}
    }
  }' | jq -r '.output[0].content[0].image_data' | base64 -d > output-text-to-image.png

This sends a text prompt to the model and decodes the base64-encoded image data from the response into output-text-to-image.png.

Your output should look similar to the following:

A serene mountain landscape at sunset generated by FLUX.2-dev — **Figure 1.** Text-to-image output: a serene mountain landscape at sunset.

Use your generated image as input

You can then take the image generated in the previous step and make additional customizations with the image-to-image workflow by providing both an image and a text prompt:

Python
curl

Read and encode the output image from the previous step, then send it along with a text prompt to the model:

generate-image-to-image.py
import base64
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

with open("output-text-to-image.png", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode("utf-8")

response = client.responses.create(
    model="black-forest-labs/FLUX.2-dev",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_image",
                    "image_url": f"data:image/png;base64,{image_base64}"
                },
                {
                    "type": "input_text",
                    "text": "Transform this into a watercolor painting"
                }
            ]
        }
    ],
    extra_body={
        "provider_options": {
            "image": {"height": 512, "width": 512, "steps": 28}
        }
    }
)

image_data = response.output[0].content[0].image_data
with open("output-image-to-image.png", "wb") as f:
    f.write(base64.b64decode(image_data))

Run the script to generate the image:

python generate-image-to-image.py

The model saves the transformed image to output-image-to-image.png in your current directory.

First, encode the output image to base64 format:

IMAGE_BASE64=$(base64 -w 0 /path/to/image-generation-quickstart/output-text-to-image.png)

The base64 string is extremely large. If you include it directly in the curl command, it will exceed the Linux argument size limit. Instead, store the request payload in a JSON file:

cat <<EOF > request.json
{
  "model": "black-forest-labs/FLUX.2-dev",
  "input": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_image",
          "image_url": "data:image/png;base64,$IMAGE_BASE64"
        },
        {
          "type": "input_text",
          "text": "Transform this into a watercolor painting"
        }
      ]
    }
  ],
  "provider_options": {
    "image": {"height": 512, "width": 512, "steps": 28}
  }
}
EOF

Then, reference the JSON request payload when making your image-to-image request:

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d @request.json \
  | jq -r '.output[0].content[0].image_data' \
  | base64 -d > output-image-to-image.png

Your output should look similar to the following:

The mountain landscape transformed into a watercolor painting by FLUX.2-dev — **Figure 2.** Image-to-image output: the mountain landscape transformed into a watercolor painting.

Next steps

Now that you can generate images, explore other inference capabilities and deployment options.

Endpoint​

Text input​

Image URL input​

Local file input​

Provider options​

Quickstart​

Set up your environment​

Serve your model​

Generate an image from text​

Use your generated image as input​

Next steps​