GPT (Generative Pre-trained Transformer)
A family of large language models by OpenAI that pioneered the approach of pre-training decoder-only transformers on massive text datasets for general-purpose language understanding and generation.
Evolution
GPT-1 (2018): 117M params, proved pre-training works. GPT-2 (2019): 1.5B params, surprisingly coherent text generation. GPT-3 (2020): 175B params, in-context learning emerged. GPT-4 (2023): Multimodal, frontier capabilities.
Architecture
All GPT models are decoder-only transformers using causal self-attention. They're trained on next-token prediction, then aligned with RLHF/DPO for instruction following. The architecture itself is relatively simple -- scale is the key innovation.
Impact
GPT models catalyzed the current AI revolution. GPT-3 popularized LLMs. ChatGPT (GPT-3.5) brought AI to mainstream consciousness. The GPT architecture became the template for most modern LLMs.