Skip to main content

max

The max command line tool runs and benchmarks MAX pipelines from one binary. Use max serve to host an OpenAI-compatible endpoint, max generate or max encode to run a model directly, max benchmark to load-test a running server, max warm-cache to compile and cache a model ahead of deployment, and max list to discover the architectures MAX supports.

To install the max CLI, install the modular package as shown in the install guide.

Usage​

max [OPTIONS] COMMAND [ARGS]...

Options​

  • --log-level <log_level>​

    Set logging level explicitly (ignored if --verbose or --quiet is used).

    Options:

    DEBUG | INFO | WARNING | ERROR

  • --version​

    Show the MAX version and exit.

Commands​

  • benchmark:

    Run benchmark tests on a serving model.

  • encode:

    Encode text input into model embeddings.

  • generate:

    Generate text using the specified model.

  • list:

    List available pipeline configurations and...

  • serve:

    Start a model serving endpoint for inference.

  • warm-cache:

    Load and compile the model to prepare caches.