max
The max command line tool runs and benchmarks MAX pipelines from one
binary. Use max serve to host an OpenAI-compatible endpoint, max generate
or max encode to run a model directly, max benchmark to load-test a
running server, max warm-cache to compile and cache a model ahead of
deployment, and max list to discover the architectures MAX supports.
To install the max CLI, install the modular package as shown
in the install guide.
Usageβ
max [OPTIONS] COMMAND [ARGS]...Optionsβ
-
--log-level <log_level>βSet logging level explicitly (ignored if --verbose or --quiet is used).
-
Options:
-
DEBUG | INFO | WARNING | ERROR
-
-
--versionβShow the MAX version and exit.
Commandsβ
-
Run benchmark tests on a serving model.
-
Encode text input into model embeddings.
-
Generate text using the specified model.
-
list:List available pipeline configurations and...
-
Start a model serving endpoint for inference.
-
Load and compile the model to prepare caches.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!