For the complete documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /max/get-started.md).

Mojo function

generic_get_paged_cache

def generic_get_paged_cache[dtype: DType](blocks: ManagedTensorSlice[MutableInput, static_spec=blocks.static_spec], cache_lengths: ManagedTensorSlice[Input, static_spec=cache_lengths.static_spec], lookup_table: ManagedTensorSlice[Input, static_spec=lookup_table.static_spec], max_lengths: ManagedTensorSlice[Input, static_spec=max_lengths.static_spec], out result: PagedKVCacheCollection[dtype, KVCacheStaticParams(Int[IntTuple](StaticTensorSpec[dtype, 6, blocks.static_spec.static_layout, blocks.static_spec.InFusion, blocks.static_spec.OutFusion, blocks.static_spec.ComputeFusion, blocks.static_spec.ComputeFusionTile].shape_tuple[4]), Int[IntTuple](StaticTensorSpec[dtype, 6, blocks.static_spec.static_layout, blocks.static_spec.InFusion, blocks.static_spec.OutFusion, blocks.static_spec.ComputeFusion, blocks.static_spec.ComputeFusionTile].shape_tuple[5]), (Int[IntTuple](StaticTensorSpec[dtype, 6, blocks.static_spec.static_layout, blocks.static_spec.InFusion, blocks.static_spec.OutFusion, blocks.static_spec.ComputeFusion, blocks.static_spec.ComputeFusionTile].shape_tuple[1]) == 1)), Int[IntTuple](StaticTensorSpec[dtype, 6, blocks.static_spec.static_layout, blocks.static_spec.InFusion, blocks.static_spec.OutFusion, blocks.static_spec.ComputeFusion, blocks.static_spec.ComputeFusionTile].shape_tuple[3])])

Returns:

def generic_get_paged_cache[dtype: DType, kv_params: KVCacheStaticParams, page_size: Int](blocks: LayoutTensor[dtype, Layout.row_major[6]()], cache_lengths: LayoutTensor[DType.uint32, Layout(IntTuple(-1))], lookup_table: LayoutTensor[DType.uint32, Layout.row_major[2]()], max_lengths: LayoutTensor[DType.uint32, Layout.row_major[2]()], out result: PagedKVCacheCollection[dtype, kv_params, page_size])

Returns:

PagedKVCacheCollection[dtype, kv_params, page_size]