Mojo function
generic_flash_attention_kv_cache_padded_materialized_mask
generic_flash_attention_kv_cache_padded_materialized_mask[collection_t: KVCollectionT, dtype: DType, //, *, target: StringSlice[StaticConstantOrigin], score_mod_str: StringSlice[StaticConstantOrigin], local_window_size: Int = -1, num_heads: Int = -1](q: NDBuffer[dtype, 4, origin, shape, strides], kv_collection: collection_t, layer_idx: UInt32, mask: NDBuffer[dtype, rank, origin, shape, strides], valid_lengths: ManagedTensorSlice[io_spec, static_spec=static_spec], scale: Float32, output: NDBuffer[dtype, 4, origin, shape, strides], context: DeviceContextPtr, sink_weights: OptionalReg[NDBuffer[dtype, 1, MutableAnyOrigin]] = OptionalReg[NDBuffer[dtype, 1, MutableAnyOrigin]]({:i1 0, 1}))
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!