Mojo struct

PagedKVCacheCollection

struct PagedKVCacheCollection[dtype_: DType, kv_params_: KVCacheStaticParams, page_size: Int, scale_dtype_: DType = DType.invalid, quantization_granularity_: Int = 1]

Fields

scales (OptionalReg[TileTensor[PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].scale_dtype, Layout[*?, *?], MutAnyOrigin]]):
kv_cache_scales_dynamic_shape (IndexList[4]):
kv_cache_scales_dynamic_strides (IndexList[4]):
blocks (PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].blocks_tt_type):
cache_lengths (PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].CacheType.cache_lengths_tt_type):
lookup_table (PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].CacheType.lookup_table_tt_type):
max_seq_length (UInt32):
max_cache_length (UInt32):
kv_cache_dynamic_shape (IndexList[4]):
kv_cache_dynamic_strides (IndexList[4]):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDestructible, KVCollectionT, Movable

`comptime` members

`blocks_layout`

comptime blocks_layout = Layout.row_major(PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].blocks_shape)

`blocks_shape`

comptime blocks_shape = IntTuple(-1, 2 if not PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].kv_params.is_mla.__bool__() else 1, -1, page_size, PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].kv_params, PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].kv_params)

`blocks_tt_layout`

comptime blocks_tt_layout = Layout[*?, *?]

`blocks_tt_type`

comptime blocks_tt_type = TileTensor[PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].dtype, Layout[*?, *?], MutAnyOrigin]

`CacheType`

comptime CacheType = PagedKVCache[PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].dtype, PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].kv_params, page_size, PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].scale_dtype, quantization_granularity_]

`dtype`

comptime dtype = dtype_

`head_dim_granularity`

comptime head_dim_granularity = ceildiv(PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].kv_params.head_size, PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].CacheType.quantization_granularity)

`kv_params`

comptime kv_params = kv_params_

`name_str`

comptime name_str = "paged"

`scale_dtype`

comptime scale_dtype = scale_dtype_

`scales_layout`

comptime scales_layout = Layout.row_major(PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].scales_shape)

`scales_shape`

comptime scales_shape = IntTuple(-1, 2 if not PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].kv_params.is_mla.__bool__() else 1, -1, page_size, PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].kv_params, PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].head_dim_granularity)

`scales_tt_layout`

comptime scales_tt_layout = Layout[*?, *?]

`scales_tt_type`

comptime scales_tt_type = TileTensor[PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].scale_dtype, Layout[*?, *?], MutAnyOrigin]

Methods

`init`

__init__(out self, blocks: LayoutTensor[PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].dtype, Layout.row_major[6](), MutAnyOrigin], cache_lengths: LayoutTensor[DType.uint32, Layout(IntTuple(-1)), ImmutAnyOrigin], lookup_table: LayoutTensor[DType.uint32, Layout.row_major[2](), ImmutAnyOrigin], max_seq_length: UInt32, max_cache_length: UInt32, scales: OptionalReg[LayoutTensor[PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].scale_dtype, Layout.row_major[6](), MutAnyOrigin]] = None)

Construct from LayoutTensor params (MOGG boundary).

__init__(out self, blocks: TileTensor[PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].dtype, Layout[*?, *?], MutAnyOrigin], cache_lengths: TileTensor[DType.uint32, Layout[*?, *?], ImmutAnyOrigin], lookup_table: TileTensor[DType.uint32, Layout[*?, *?], ImmutAnyOrigin], max_seq_length: UInt32, max_cache_length: UInt32, scales: OptionalReg[TileTensor[PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].scale_dtype, Layout[*?, *?], MutAnyOrigin]] = None)

Construct from TileTensor fields directly.

`get_key_cache`

get_key_cache(self, layer_idx: Int) -> PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].CacheType

Returns:

PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].CacheType

`get_value_cache`

get_value_cache(self, layer_idx: Int) -> PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].CacheType

Returns:

PagedKVCacheCollection[dtype_, kv_params_, page_size, scale_dtype_, quantization_granularity_].CacheType

`cache_length`

cache_length(self, bs_idx: Int) -> Int

Returns:

Int

Fields​

Implemented traits​

comptime members​

blocks_layout​

blocks_shape​

blocks_tt_layout​

blocks_tt_type​

CacheType​

dtype​

head_dim_granularity​

kv_params​

name_str​

scale_dtype​

scales_layout​

scales_shape​

scales_tt_layout​

scales_tt_type​

Methods​

__init__​

get_key_cache​

get_value_cache​

cache_length​

Fields

Implemented traits

`comptime` members

`blocks_layout`

`blocks_shape`

`blocks_tt_layout`

`blocks_tt_type`

`CacheType`

`dtype`

`head_dim_granularity`

`kv_params`

`name_str`

`scale_dtype`

`scales_layout`

`scales_shape`

`scales_tt_layout`

`scales_tt_type`

Methods

`init`

`get_key_cache`

`get_value_cache`

`cache_length`