IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

max

The max command line tool runs and benchmarks MAX pipelines from one binary. Use max serve to host an OpenAI-compatible endpoint, max generate or max encode to run a model directly, max benchmark to load-test a running server, max warm-cache to compile and cache a model ahead of deployment, and max list to discover the architectures MAX supports.

To install the max CLI, install the modular package as shown in the install guide.

Usage​

max [OPTIONS] COMMAND [ARGS]...

Options​

  • --log-level <log_level>​

    Set logging level explicitly (ignored if --verbose or --quiet is used).

    Options:

    DEBUG | INFO | WARNING | ERROR

  • --version​

    Show the MAX version and exit.

Commands​

  • benchmark:

    Run benchmark tests on a serving model.

  • encode:

    Encode text input into model embeddings.

  • generate:

    Generate text using the specified model.

  • list:

    List available pipeline configurations and...

  • serve:

    Start a model serving endpoint for inference.

  • warm-cache:

    Load and compile the model to prepare caches.