IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /max/get-started.md). For the complete documentation index, see llms.txt.
Skip to main content
For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

ContinuousBatchingKVCacheCollection

struct ContinuousBatchingKVCacheCollection[dtype_: DType, kv_params_: KVCacheStaticParams, blocks_origin: MutOrigin, cache_lengths_origin: ImmutOrigin, lookup_table_origin: ImmutOrigin]

This is a "view" of the cache for the given sequences in the batch.

This object does not own the underlying buffers in k_cache and v_cache, it's borrowing them from the BlockWrappers in our KVCacheManager.

Parameters​

  • ​dtype_ (DType): The dtype of the kv-cache.
  • ​kv_params_ (KVCacheStaticParams): The kv-cache static parameters.
  • ​blocks_origin (MutOrigin): Origin of the KV cache blocks buffer.
  • ​cache_lengths_origin (ImmutOrigin): Origin of the cache lengths buffer.
  • ​lookup_table_origin (ImmutOrigin): Origin of the lookup table buffer.

Fields​

  • ​blocks (ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].blocks_tt_type):
  • ​cache_lengths (ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].CacheType.cache_lengths_tt_type):
  • ​lookup_table (ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].CacheType.lookup_table_tt_type):
  • ​max_seq_length (UInt32):
  • ​max_cache_length (UInt32):
  • ​kv_cache_dynamic_shape (IndexList[Int(4)]):
  • ​kv_cache_dynamic_strides (IndexList[Int(4)]):

Implemented traits​

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDeletable, KVCollectionT, Movable

comptime members​

blocks_layout​

comptime blocks_layout = Layout.row_major(ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].blocks_shape)

blocks_shape​

comptime blocks_shape = IntTuple(Int(-1), Int(-1), Int(-1), Int(-1), kv_params_, kv_params_)

blocks_tt_layout​

comptime blocks_tt_layout = Layout[*?, *?]

blocks_tt_type​

comptime blocks_tt_type = TileTensor[ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].dtype, Layout[*?, *?], blocks_origin]

CacheType​

comptime CacheType = ContinuousBatchingKVCache[ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].dtype, ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].kv_params, blocks_origin, cache_lengths_origin, lookup_table_origin]

dtype​

comptime dtype = dtype_

kv_params​

comptime kv_params = kv_params_

name_str​

comptime name_str = "continuous_batching"

scale_dtype​

comptime scale_dtype = DType.invalid

Methods​

__init__​

def __init__(out self, blocks: LayoutTensor[Self.dtype, Layout.row_major[Int(6)](), blocks_origin], cache_lengths: LayoutTensor[DType.uint32, Layout(IntTuple(Int(-1))), cache_lengths_origin], lookup_table: LayoutTensor[DType.uint32, Layout(IntTuple(Int(-1))), lookup_table_origin], max_seq_length: UInt32, max_cache_length: UInt32)

Construct from LayoutTensor params (MOGG boundary).

def __init__(out self, blocks: TileTensor[Self.dtype, Layout[*?, *?], blocks_origin], cache_lengths: TileTensor[DType.uint32, Layout[*?, *?], cache_lengths_origin], lookup_table: TileTensor[DType.uint32, Layout[*?, *?], lookup_table_origin], max_seq_length: UInt32, max_cache_length: UInt32)

Construct from TileTensor fields directly.

get_key_cache​

def get_key_cache(self, layer_idx: Int) -> Self.CacheType

Returns:

Self.CacheType

get_value_cache​

def get_value_cache(self, layer_idx: Int) -> Self.CacheType

Returns:

Self.CacheType

cache_length​

def cache_length(self, bs_idx: Int) -> Int

Returns:

Int