Skip to main content

Python class

KVCacheQuantizationConfig

KVCacheQuantizationConfigโ€‹

class max.nn.kv_cache.KVCacheQuantizationConfig(scale_dtype=float32, quantization_granularity=128)

source

Bases: object

Configuration for KVCache quantization.

Currently only FP8 Quantization is supported.

Parameters:

  • scale_dtype (DType)
  • quantization_granularity (int)

quantization_granularityโ€‹

quantization_granularity: int = 128

source

Block-size used for KVCache quantization along head-dimension (e.g. 128).

scale_dtypeโ€‹

scale_dtype: DType = 81

source

Data type of quantization scales, if quantization is enabled