AI Glossary

Pre-Training

The initial phase of training a model on a large, general-purpose dataset to learn broad representations before fine-tuning on specific tasks.

For LLMs

Pre-training involves next-token prediction on trillions of tokens from the internet, books, code, and other sources. This phase costs millions of dollars and produces a model with broad knowledge but no specific task optimization.

The Pre-Train Then Fine-Tune Paradigm

First introduced by ULMFiT and popularized by BERT/GPT, this two-stage approach is now universal: pre-train on massive data for general capabilities, then fine-tune (or align) for specific tasks. It's vastly more efficient than training from scratch for each task.

← Back to AI Glossary

Last updated: March 5, 2026