AI Glossary

Generative Pre-Training

The training paradigm where a model learns to generate text by predicting the next token on massive unlabeled text corpora, forming the basis of models like GPT.

The GPT Approach

Train on internet text using next-token prediction (causal language modeling). The model learns grammar, facts, reasoning patterns, and even code. Scale to billions of parameters and trillions of tokens. The resulting model can then be fine-tuned or prompted.

Why It Works

Predicting the next word requires understanding context, semantics, logic, and world knowledge. This simple objective, at sufficient scale, produces models with emergent capabilities like translation, summarization, and code generation — without explicit training on these tasks.

Training Pipeline

Pre-training (unsupervised, massive compute) → Instruction tuning (supervised, moderate compute) → RLHF/DPO alignment (reinforcement learning, focused compute). Each stage adds capabilities while building on the previous stage's knowledge.

← Back to AI Glossary

Generative Pre-Training

The GPT Approach

Why It Works

Training Pipeline

Related Articles

AI Agents: The Complete Guide for 2026

The AI Alignment Problem Explained: Why It Matters

AI Alignment Research: Ensuring AI Does What We Want

AI Bias: Detection, Measurement, and Mitigation Strategies

AI Coding Assistants: How GitHub Copilot, Cursor, and Claude Code Are Changing Development

Related Concepts