Mojo module
kv_buffer
KV cache buffer for structured MHA kernels (TileTensor hot path).
Provides KVCacheIterator (TileTensor-based DRAM tile iteration) and KVBuffer (DMA + LDS + register tile management).
TileTensor is used throughout β no LayoutTensor in this file:
- DRAM tiles: TileTensor with RuntimeInt valid_rows (KVCacheIterator)
- SMEM sub-tiles: flat TileTensor views via smem_subtile/smem_mma_subtile
- DMA: tt_copy_dram_to_sram_lds (both src and dst are TileTensor)
- LDS loads: tt_load_b / tt_load_b_tr (TileTensor SMEM -> SIMD)
- MMA register tiles: TileTensor in LOCAL with stack_allocation
TiledTensorCore.mma() in tensor_core.mojo has TileTensor overloads that construct LayoutTensor views at the MMA boundary.
Structsβ
- β
KVBuffer: KV cache buffer managing DMA, LDS staging, and register tiles. - β
KVCacheIterator: TileTensor-based DRAM tile iterator.
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!