For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

max

The max command line tool runs and benchmarks MAX pipelines from one binary. Use max serve to host an OpenAI-compatible endpoint, max generate or max encode to run a model directly, max benchmark to load-test a running server, max warm-cache to compile and cache a model ahead of deployment, and max list to discover the architectures MAX supports.

To install the max CLI, install the modular package as shown in the install guide.

Usage

max [OPTIONS] COMMAND [ARGS]...

Options

--log-level <log_level>

Set logging level explicitly (ignored if --verbose or --quiet is used).

Options:

DEBUG | INFO | WARNING | ERROR

--version

Show the MAX version and exit.

Commands

benchmark:

Run benchmark tests on a serving model.

encode:

Encode text input into model embeddings.

generate:

Generate text using the specified model.

list:

List available pipeline configurations and...

serve:

Start a model serving endpoint for inference.

warm-cache:

Load and compile the model to prepare caches.

max

Usage

Options

`--log-level <log_level>`

`--version`

Commands

Usage​

Options​

--log-level <log_level>​

--version​

Commands​

Usage

Options

`--log-level <log_level>`

`--version`

Commands