Generative Pre-Training
The training paradigm where a model learns to generate text by predicting the next token on massive unlabeled text corpora, forming the basis of models like GPT.
The GPT Approach
Train on internet text using next-token prediction (causal language modeling). The model learns grammar, facts, reasoning patterns, and even code. Scale to billions of parameters and trillions of tokens. The resulting model can then be fine-tuned or prompted.
Why It Works
Predicting the next word requires understanding context, semantics, logic, and world knowledge. This simple objective, at sufficient scale, produces models with emergent capabilities like translation, summarization, and code generation — without explicit training on these tasks.
Training Pipeline
Pre-training (unsupervised, massive compute) → Instruction tuning (supervised, moderate compute) → RLHF/DPO alignment (reinforcement learning, focused compute). Each stage adds capabilities while building on the previous stage's knowledge.