For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo struct

ContinuousBatchingKVCacheCollection

struct ContinuousBatchingKVCacheCollection[dtype_: DType, kv_params_: KVCacheStaticParams, blocks_origin: MutOrigin, cache_lengths_origin: ImmutOrigin, lookup_table_origin: ImmutOrigin]

This is a "view" of the cache for the given sequences in the batch.

This object does not own the underlying buffers in k_cache and v_cache, it's borrowing them from the BlockWrappers in our KVCacheManager.

Parameters

dtype_ (DType): The dtype of the kv-cache.
kv_params_ (KVCacheStaticParams): The kv-cache static parameters.
blocks_origin (MutOrigin): Origin of the KV cache blocks buffer.
cache_lengths_origin (ImmutOrigin): Origin of the cache lengths buffer.
lookup_table_origin (ImmutOrigin): Origin of the lookup table buffer.

Fields

blocks (ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].blocks_tt_type):
cache_lengths (ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].CacheType.cache_lengths_tt_type):
lookup_table (ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].CacheType.lookup_table_tt_type):
max_seq_length (UInt32):
max_cache_length (UInt32):
kv_cache_dynamic_shape (IndexList[Int(4)]):
kv_cache_dynamic_strides (IndexList[Int(4)]):

Implemented traits

AnyType, Copyable, ImplicitlyCopyable, ImplicitlyDeletable, KVCollectionT, Movable

`comptime` members

`blocks_layout`

comptime blocks_layout = Layout.row_major(ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].blocks_shape)

`blocks_shape`

comptime blocks_shape = IntTuple(Int(-1), Int(-1), Int(-1), Int(-1), kv_params_, kv_params_)

`blocks_tt_layout`

comptime blocks_tt_layout = Layout[*?, *?]

`blocks_tt_type`

comptime blocks_tt_type = TileTensor[ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].dtype, Layout[*?, *?], blocks_origin]

`CacheType`

comptime CacheType = ContinuousBatchingKVCache[ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].dtype, ContinuousBatchingKVCacheCollection[dtype_, kv_params_, blocks_origin, cache_lengths_origin, lookup_table_origin].kv_params, blocks_origin, cache_lengths_origin, lookup_table_origin]

`dtype`

comptime dtype = dtype_

`kv_params`

comptime kv_params = kv_params_

`name_str`

comptime name_str = "continuous_batching"

`scale_dtype`

comptime scale_dtype = DType.invalid

Methods

`init`

def __init__(out self, blocks: LayoutTensor[Self.dtype, Layout.row_major[Int(6)](), blocks_origin], cache_lengths: LayoutTensor[DType.uint32, Layout(IntTuple(Int(-1))), cache_lengths_origin], lookup_table: LayoutTensor[DType.uint32, Layout(IntTuple(Int(-1))), lookup_table_origin], max_seq_length: UInt32, max_cache_length: UInt32)

Construct from LayoutTensor params (MOGG boundary).

def __init__(out self, blocks: TileTensor[Self.dtype, Layout[*?, *?], blocks_origin], cache_lengths: TileTensor[DType.uint32, Layout[*?, *?], cache_lengths_origin], lookup_table: TileTensor[DType.uint32, Layout[*?, *?], lookup_table_origin], max_seq_length: UInt32, max_cache_length: UInt32)

Construct from TileTensor fields directly.

`get_key_cache`

def get_key_cache(self, layer_idx: Int) -> Self.CacheType

Returns:

Self.CacheType

`get_value_cache`

def get_value_cache(self, layer_idx: Int) -> Self.CacheType

Returns:

Self.CacheType

`cache_length`

def cache_length(self, bs_idx: Int) -> Int

Returns:

Int

Parameters​

Fields​

Implemented traits​

comptime members​

blocks_layout​

blocks_shape​

blocks_tt_layout​

blocks_tt_type​

CacheType​

dtype​

kv_params​

name_str​

scale_dtype​

Methods​

__init__​

get_key_cache​

get_value_cache​

cache_length​

Parameters

Fields

Implemented traits

`comptime` members

`blocks_layout`

`blocks_shape`

`blocks_tt_layout`

`blocks_tt_type`

`CacheType`

`dtype`

`kv_params`

`name_str`

`scale_dtype`

Methods

`init`

`get_key_cache`

`get_value_cache`

`cache_length`