Mojo module
types
This module contains the types for the key-value cache APIs.
The module includes structs implementing several different types of KV caches.
This module defines two traits that define the roles of the different structs
KVCacheT: Defines the interface for a single (key or value) cache.KVCollectionT: Defines the interface for a pair of caches (keys and values).
Structsβ
- β
ContinuousBatchingKVCache: Wrapper for the ContinuousKVCache of a given layer in the transformer model. - β
ContinuousBatchingKVCacheCollection: This is a "view" of the cache for the given sequences in the batch. - β
KVCacheStaticParams: - β
PagedKVCache: The PagedKVCache is a wrapper around the KVCache blocks for a given layer. It is used to access the KVCache blocks for PagedAttention. - β
PagedKVCacheCollection: - β
PagedRowIndices: Pre-computed physical row indices for a BN-row range of paged KV cache.
Traitsβ
- β
KVCacheT: Trait for different KVCache types and implementations. - β
KVCollectionT: Trait for a pair of caches (keys and values).
Functionsβ
- β
kv_num_sub_tiles: Number of sub-tile TMA copies needed fortile_BNrows. - β
kv_sub_tile_rows: Sub-tile row count for a TMA load oftile_BNrows. - β
padded_depth: - β
swizzle_granularity:
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!