For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

KVCacheQuantizationConfig

`KVCacheQuantizationConfig`

class max.nn.kv_cache.KVCacheQuantizationConfig(scale_dtype=float32, quantization_granularity=128)

source

Bases: object

Configuration for KVCache quantization.

Currently only FP8 Quantization is supported.

Parameters:

scale_dtype (DType)
quantization_granularity (int)

`quantization_granularity`

quantization_granularity: int = 128

source

Block-size used for KVCache quantization along head-dimension (e.g. 128).

`scale_dtype`

scale_dtype: DType = 81

source

Data type of quantization scales, if quantization is enabled

KVCacheQuantizationConfig​

quantization_granularity​

scale_dtype​

`KVCacheQuantizationConfig`

`quantization_granularity`

`scale_dtype`