For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).
Mojo struct
ContinuousBatchingKVCacheCollection
struct ContinuousBatchingKVCacheCollection[dtype_: DType, kv_params_: KVCacheStaticParams, blocks_origin: MutOrigin, cache_lengths_origin: ImmutOrigin, lookup_table_origin: ImmutOrigin]
This is a "view" of the cache for the given sequences in the batch.
This object does not own the underlying buffers in k_cache and v_cache, it's borrowing them from the BlockWrappers in our KVCacheManager.
Parametersβ
- βdtype_ (
DType): The dtype of the kv-cache. - βkv_params_ (
KVCacheStaticParams): The kv-cache static parameters. - βblocks_origin (
MutOrigin): Origin of the KV cache blocks buffer. - βcache_lengths_origin (
ImmutOrigin): Origin of the cache lengths buffer. - βlookup_table_origin (
ImmutOrigin): Origin of the lookup table buffer.
Fieldsβ
- βblocks (
ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].blocks_tt_type): - βcache_lengths (
ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].CacheType.cache_lengths_tt_type): - βlookup_table (
ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].CacheType.lookup_table_tt_type): - βmax_seq_length (
UInt32): - βmax_cache_length (
UInt32): - βkv_cache_dynamic_shape (
IndexList[Int(4)]): - βkv_cache_dynamic_strides (
IndexList[Int(4)]):
Implemented traitsβ
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDeletable,
KVCollectionT,
Movable
comptime membersβ
blocks_layoutβ
comptime blocks_layout = Layout.row_major(ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].blocks_shape)
blocks_shapeβ
comptime blocks_shape = IntTuple(Int(-1), Int(-1), Int(-1), Int(-1), kv_params_, kv_params_)
blocks_tt_layoutβ
comptime blocks_tt_layout = Layout[*?, *?]
blocks_tt_typeβ
comptime blocks_tt_type = TileTensor[ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].dtype, Layout[*?, *?], blocks_origin]
CacheTypeβ
comptime CacheType = ContinuousBatchingKVCache[ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].dtype, ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].kv_params, blocks_origin, cache_lengths_origin, lookup_table_origin]
dtypeβ
comptime dtype = dtype_
kv_paramsβ
comptime kv_params = kv_params_
name_strβ
comptime name_str = "continuous_batching"
scale_dtypeβ
comptime scale_dtype = DType.invalid
Methodsβ
__init__β
def __init__(out self, blocks: LayoutTensor[Self.dtype, Layout.row_major[Int(6)](), blocks_origin], cache_lengths: LayoutTensor[DType.uint32, Layout(IntTuple(Int(-1))), cache_lengths_origin], lookup_table: LayoutTensor[DType.uint32, Layout(IntTuple(Int(-1))), lookup_table_origin], max_seq_length: UInt32, max_cache_length: UInt32)
Construct from LayoutTensor params (MOGG boundary).
def __init__(out self, blocks: TileTensor[Self.dtype, Layout[*?, *?], blocks_origin], cache_lengths: TileTensor[DType.uint32, Layout[*?, *?], cache_lengths_origin], lookup_table: TileTensor[DType.uint32, Layout[*?, *?], lookup_table_origin], max_seq_length: UInt32, max_cache_length: UInt32)
Construct from TileTensor fields directly.
get_key_cacheβ
def get_key_cache(self, layer_idx: Int) -> Self.CacheType
Returns:
Self.CacheType
get_value_cacheβ
def get_value_cache(self, layer_idx: Int) -> Self.CacheType
Returns:
Self.CacheType
cache_lengthβ
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!