Transformer

A neural network architecture designed to perform complex tasks with sequential data (such as text, speech, and images) in a manner that can be efficiently parallelized on GPUs or other accelerator hardware. This makes them highly effective for natural language processing and other generative AI (GenAI) applications.

The transformer model architecture was first introduced in the paper Attention Is All You Need (Vaswani, et al., 2017). This design emphasizes the use of self-attention mechanisms instead of recurrent structures like recurrent neural networks (RNNs) or long short-term memory networks (LSTMs), which is what allows for the processing to be parallelized across separate compute cores instead of requiring the model to generate predictions synchronously. This design is currently the foundation for all major large language models (LLMs) such as GPT, Llama, Gemini, DeepSeek, and more.