Skip to main content

Mojo module

kv_buffer

KV cache buffer for structured MHA kernels (TileTensor hot path).

Provides KVCacheIterator (TileTensor-based DRAM tile iteration) and KVBuffer (DMA + LDS + register tile management).

TileTensor is used throughout β€” no LayoutTensor in this file:

  • DRAM tiles: TileTensor with RuntimeInt valid_rows (KVCacheIterator)
  • SMEM sub-tiles: flat TileTensor views via smem_subtile/smem_mma_subtile
  • DMA: tt_copy_dram_to_sram_lds (both src and dst are TileTensor)
  • LDS loads: tt_load_b / tt_load_b_tr (TileTensor SMEM -> SIMD)
  • MMA register tiles: TileTensor in LOCAL with stack_allocation

TiledTensorCore.mma() in tensor_core.mojo has TileTensor overloads that construct LayoutTensor views at the MMA boundary.

Structs​

Was this page helpful?