AI Glossary

GPT (Generative Pre-trained Transformer)

A family of large language models by OpenAI that pioneered the approach of pre-training decoder-only transformers on massive text datasets for general-purpose language understanding and generation.

Evolution

GPT-1 (2018): 117M params, proved pre-training works. GPT-2 (2019): 1.5B params, surprisingly coherent text generation. GPT-3 (2020): 175B params, in-context learning emerged. GPT-4 (2023): Multimodal, frontier capabilities.

Architecture

All GPT models are decoder-only transformers using causal self-attention. They're trained on next-token prediction, then aligned with RLHF/DPO for instruction following. The architecture itself is relatively simple -- scale is the key innovation.

Impact

GPT models catalyzed the current AI revolution. GPT-3 popularized LLMs. ChatGPT (GPT-3.5) brought AI to mainstream consciousness. The GPT architecture became the template for most modern LLMs.

← Back to AI Glossary

GPT (Generative Pre-trained Transformer)

Evolution

Architecture

Impact

Related Articles

GPT Architecture Explained: From GPT-1 to GPT-4

Decoder-Only Models: The GPT Family and Autoregressive Generation

The Future of LLMs: What Comes After GPT-4?

The History of Artificial Intelligence: From Turing to GPT

How LLMs Are Trained: From Raw Text to ChatGPT

Related Concepts