Python class
KVCacheQuantizationConfig
KVCacheQuantizationConfigโ
class max.nn.kv_cache.KVCacheQuantizationConfig(scale_dtype=float32, quantization_granularity=128)
Bases: object
Configuration for KVCache quantization.
Currently only FP8 Quantization is supported.
quantization_granularityโ
quantization_granularity: int = 128
Block-size used for KVCache quantization along head-dimension (e.g. 128).
scale_dtypeโ
scale_dtype: DType = 81
Data type of quantization scales, if quantization is enabled
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!