IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Python class

KVCacheQuantizationConfig

KVCacheQuantizationConfigโ€‹

class max.nn.kv_cache.KVCacheQuantizationConfig(scale_dtype=float32, quantization_granularity=128)

source

Bases: object

Configuration for KVCache quantization.

Currently only FP8 Quantization is supported.

Parameters:

  • scale_dtype (DType)
  • quantization_granularity (int)

quantization_granularityโ€‹

quantization_granularity: int = 128

source

Block-size used for KVCache quantization along head-dimension (e.g. 128).

scale_dtypeโ€‹

scale_dtype: DType = 81

source

Data type of quantization scales, if quantization is enabled