Mojo function
generic_flash_attention_kv_cache_continuous_batch
generic_flash_attention_kv_cache_continuous_batch[target: StringSlice[StaticConstantOrigin], type: DType](q: NDBuffer[type, 4, origin, shape, strides], kv_collection: ContinuousBatchingKVCacheCollection[type_, kv_params_], layer_idx: SIMD[uint32, 1], mask: NDBuffer[type, rank, origin, shape, strides], valid_lengths: NDBuffer[uint32, 1, origin], scale: SIMD[float32, 1], output: NDBuffer[type, 4, origin, shape, strides], context: DeviceContextPtr)
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!