Autoregression

A process by which an AI model iteratively predicts future values based on previous values in a sequence, using its own output as input to itself. Because each prediction depends on prior context, the process is sequential, which limits parallelization.

Autoregression is a standard procedure in transformer models such as large language models (LLMs) and other models that perform time-series forecasting. This autoregressive process explains why AI chat bots like ChatGPT and Gemini stream the output one word at a time—although they sometimes run so fast that they appear to produce more than one word at a time.