Skip to main content

Mojo function

generic_get_paged_cache_with_scales

generic_get_paged_cache_with_scales[dtype: DType, scale_dtype: DType, kv_params: KVCacheStaticParams, page_size: Int, quantization_granularity: Int](blocks: LayoutTensor[dtype, Layout.row_major[6](), blocks.origin], cache_lengths: LayoutTensor[DType.uint32, Layout(IntTuple(-1)), cache_lengths.origin], lookup_table: LayoutTensor[DType.uint32, Layout.row_major[2](), lookup_table.origin], max_lengths: LayoutTensor[DType.uint32, Layout.row_major[2](), max_lengths.origin], scales: LayoutTensor[scale_dtype, Layout.row_major[6](), scales.origin], out result: PagedKVCacheCollection[dtype, kv_params, page_size, scale_dtype, quantization_granularity])

Create a PagedKVCacheCollection with scales for MLA attention.

Args:

  • blocks (LayoutTensor): KV cache blocks tensor [num_blocks, kv_dim, num_layers, page_size, num_heads, head_dim].
  • cache_lengths (LayoutTensor): Cache lengths per batch [batch_size].
  • lookup_table (LayoutTensor): Page lookup table [batch_size, max_pages].
  • max_lengths (LayoutTensor): Max lengths tensor [[max_seq_length, max_cache_length]].
  • scales (LayoutTensor): Scales tensor [num_blocks, kv_dim, num_layers, page_size, num_heads, head_dim_granularity].

Returns:

PagedKVCacheCollection

Was this page helpful?