Prefill

The first phase of an AI model's forward pass in which the model processes the input and initializes a cache to accelerate predictions.

Different model architectures may have their own version of a prefill, but it's primarily associated with large language models (LLMs), in which case it's also called context encoding.