Mojo struct
ContinuousBatchingKVCacheCollection
struct ContinuousBatchingKVCacheCollection[type_: DType, kv_params_: KVCacheStaticParams]
This is a "view" of the cache for the given sequences in the batch.
This object does not own the underlying buffers in k_cache and v_cache, it's borrowing them from the BlockWrappers in our KVCacheManager. It does own the Pointer[NDBuffer[type, 3]] and valid_lengths buffer
Aliases
type = type_
:kv_params = kv_params_
:CacheType = ContinuousBatchingKVCache[type_, kv_params_]
:blocks_shape = DimList(Dim(-31337), Dim(-31337), Dim(-31337), Dim(-31337), Dim(kv_params_.num_heads), Dim(kv_params_.head_size))
:blocks_stride = _strides_from_shape[::DimList,::Int]()
:blocks_type = NDBuffer[type_, 6, MutableAnyOrigin, DimList(Dim(-31337), Dim(-31337), Dim(-31337), Dim(-31337), Dim(kv_params_.num_heads), Dim(kv_params_.head_size)), _strides_from_shape[::DimList,::Int]()]
:
Fields
- cache_lengths (
NDBuffer[uint32, 1, MutableAnyOrigin]
): - lookup_table (
NDBuffer[uint32, 1, MutableAnyOrigin]
): - blocks (
NDBuffer[type_, 6, MutableAnyOrigin, DimList(Dim(-31337), Dim(-31337), Dim(-31337), Dim(-31337), Dim(kv_params_.num_heads), Dim(kv_params_.head_size)), _strides_from_shape[::DimList,::Int]()]
): - max_seq_length (
SIMD[uint32, 1]
): - max_cache_length (
SIMD[uint32, 1]
): - kv_cache_dynamic_shape (
Index[4]
): - kv_cache_dynamic_strides (
Index[4]
):
Implemented traits
AnyType
,
Copyable
,
KVCollectionT
,
Movable
,
UnknownDestructibility
Methods
__init__
__init__(out self, blocks: NDBuffer[type_, 6, MutableAnyOrigin], cache_lengths: NDBuffer[uint32, 1, MutableAnyOrigin], lookup_table: NDBuffer[uint32, 1, MutableAnyOrigin], max_seq_length: SIMD[uint32, 1], max_cache_length: SIMD[uint32, 1])
__copyinit__
__copyinit__(out self, other: Self)
__moveinit__
__moveinit__(out self, owned other: Self)
copy
copy(self) -> Self
Explicitly construct a copy of self.
Returns:
A copy of this value.
get_key_cache
get_key_cache(self, layer_idx: Int) -> ContinuousBatchingKVCache[type_, kv_params_]
get_value_cache
get_value_cache(self, layer_idx: Int) -> ContinuousBatchingKVCache[type_, kv_params_]
cache_length
cache_length(self, bs_idx: Int) -> Int
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!