Skip to main content

Image generation

With MAX, you can run open-source image generation models locally and access them through a REST API. This page explains how to use the v1/responses endpoint to generate images from text prompts or transform existing images, with examples for each input type.

Endpoint

The MAX v1/responses endpoint provides a unified interface for diverse AI tasks including image generation, with structured input and output handling. It's built on Open Responses, an open-source initiative to create a standardized, provider-agnostic API specification that works across different AI providers and model backends.

Text input

For text-to-image generation, set input to a plain string describing the image you want. The model returns the generated image as base64-encoded data in output[0].content[0].image_data:

response = client.responses.create(
    model="black-forest-labs/FLUX.2-dev",
    input="Your text prompt here",
    extra_body={
        "provider_options": {
            "image": {"height": 1024, "width": 1024, "steps": 28}
        }
    }
)

image_data = response.output[0].content[0].image_data

Image URL input

For image-to-image workflows, set input to a structured message array containing the source image URL and a text prompt describing the transformation. The type field distinguishes image and text content within the same message:

response = client.responses.create(
    model="black-forest-labs/FLUX.2-dev",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_image",
                    "image_url": "https://example.com/input.png"
                },
                {
                    "type": "input_text",
                    "text": "Your transformation prompt"
                }
            ]
        }
    ],
    extra_body={
        "provider_options": {
            "image": {"height": 1024, "width": 1024, "steps": 28}
        }
    }
)

image_data = response.output[0].content[0].image_data

Local file input

Local files must be base64-encoded and passed as a data URI in the image_url field using the format data:<mime-type>;base64,<data>.

import base64

with open("/path/to/image.png", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode("utf-8")

response = client.responses.create(
    model="black-forest-labs/FLUX.2-dev",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_image",
                    "image_url": f"data:image/png;base64,{image_base64}"
                },
                {
                    "type": "input_text",
                    "text": "Your transformation prompt"
                }
            ]
        }
    ],
    extra_body={
        "provider_options": {
            "image": {"height": 1024, "width": 1024, "steps": 28}
        }
    }
)

image_data = response.output[0].content[0].image_data

Provider options

The provider_options argument is an extension point in the Open Responses spec that lets each API provider expose parameters beyond the standard request fields. MAX uses it to surface image generation controls such as dimensions and denoising steps.

The following parameters are available under provider_options.image:

ParameterDefaultDescription
height / width1024Output dimensions in pixels (must be multiples of 16)
steps28Number of denoising steps
guidance_scale3.5How closely the output follows the prompt
negative_prompt""Content to avoid in the output

Height and width: You can generate images at different aspect ratios. Image dimensions must be a multiple of 16 and are automatically scaled to a multiple of 16 if an incompatible integer is provided.

Steps: steps has the greatest effect on generation time. Diffusion models work by iteratively refining a noisy image. More steps produce higher-quality results but take proportionally longer. You can experiment with steps when considering the tradeoffs of speed and quality.

Prompt adherence: guidance_scale determines how literally the model interprets your prompt. Higher values (7-10) produce results that closely match the prompt. Lower values (1-3) allow more creative variation.

Negative prompts: Use negative_prompt to steer the model away from unwanted content, for example "blurry, low quality, distorted". It's best practice to include any negative prompting in the negative_prompt argument and not in the main input_text string.

If you encounter memory errors, try reducing your output image dimensions or the number of denoising steps:

"provider_options": {"image": {"height": 512, "width": 512, "steps": 25}}

Quickstart

In this quickstart, learn how to set up and run FLUX.2-dev for image generation.

System requirements:

Set up your environment

Create a Python project to install our APIs and CLI tools:

  1. If you don't have it, install pixi:
    curl -fsSL https://pixi.sh/install.sh | sh

    Then restart your terminal for the changes to take effect.

  2. Create a project:
    pixi init image-generation-quickstart \
      -c https://conda.modular.com/max-nightly/ -c conda-forge \
      && cd image-generation-quickstart
  3. Install modular (nightlyTo get the stable build, change the version in the website header.):
    pixi add modular
  4. Start the virtual environment:
    pixi shell

Serve your model

First, enable the v1/responses endpoint by setting the MAX_SERVE_API_TYPES environment variable:

export MAX_SERVE_API_TYPES='["responses"]'

Agree to the FLUX license and make your Hugging Face access token available in your environment:

export HF_TOKEN="hf_..."

Then, use the max serve command to start a local model server with the FLUX.2-dev model:

max serve \
  --model black-forest-labs/FLUX.2-dev

The endpoint is ready when you see this message printed in your terminal:

Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)

For a complete list of max CLI commands and options, refer to the MAX CLI reference.

Generate an image from text

Generate an image from a text description by sending a request to the v1/responses endpoint. The input field is a text string describing the desired image, and provider_options controls generation parameters provided by Modular. You can send requests using either the OpenAI Python SDK or curl:

You can use OpenAI's Python client to interact with the image generation model. First, install the OpenAI SDK:

pixi add openai

Then, create a client and make a request to the model:

generate-image.py
import base64
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

response = client.responses.create(
    model="black-forest-labs/FLUX.2-dev",
    input="A serene mountain landscape at sunset",
    extra_body={
        "provider_options": {
            "image": {"height": 512, "width": 512, "steps": 28}
        }
    }
)

image_data = response.output[0].content[0].image_data
with open("output-text-to-image.png", "wb") as f:
    f.write(base64.b64decode(image_data))

Run the script to generate the image:

python generate-image.py

The model saves the generated image to output-text-to-image.png in your current directory.

Your output should look similar to the following:

A serene mountain landscape at sunset generated by FLUX.2-dev
Figure 1. Text-to-image output: a serene mountain landscape at sunset.

Use your generated image as input

You can then take the image generated in the previous step and make additional customizations with the image-to-image workflow by providing both an image and a text prompt:

Read and encode the output image from the previous step, then send it along with a text prompt to the model:

generate-image-to-image.py
import base64
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

with open("output-text-to-image.png", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode("utf-8")

response = client.responses.create(
    model="black-forest-labs/FLUX.2-dev",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_image",
                    "image_url": f"data:image/png;base64,{image_base64}"
                },
                {
                    "type": "input_text",
                    "text": "Transform this into a watercolor painting"
                }
            ]
        }
    ],
    extra_body={
        "provider_options": {
            "image": {"height": 512, "width": 512, "steps": 28}
        }
    }
)

image_data = response.output[0].content[0].image_data
with open("output-image-to-image.png", "wb") as f:
    f.write(base64.b64decode(image_data))

Run the script to generate the image:

python generate-image-to-image.py

The model saves the transformed image to output-image-to-image.png in your current directory.

Your output should look similar to the following:

The mountain landscape transformed into a watercolor painting by FLUX.2-dev
Figure 2. Image-to-image output: the mountain landscape transformed into a watercolor painting.

Next steps

Now that you can generate images, explore other inference capabilities and deployment options.

Was this page helpful?