Skip to main content

Mojo function

generic_get_paged_cache_with_scales

generic_get_paged_cache_with_scales[dtype: DType, scale_dtype: DType, kv_params: KVCacheStaticParams, page_size: Int, quantization_granularity: Int](blocks: LayoutTensor[dtype, Layout.row_major[6]()], cache_lengths: LayoutTensor[DType.uint32, Layout(IntTuple(-1))], lookup_table: LayoutTensor[DType.uint32, Layout.row_major[2]()], max_lengths: LayoutTensor[DType.uint32, Layout.row_major[2]()], scales: LayoutTensor[scale_dtype, Layout.row_major[6]()], out result: PagedKVCacheCollection[dtype, kv_params, page_size, scale_dtype, quantization_granularity])

Create a PagedKVCacheCollection with scales for MLA attention.

Args:

Returns:

PagedKVCacheCollection[dtype, kv_params, page_size, scale_dtype, quantization_granularity]