AI Glossary

BERT

Bidirectional Encoder Representations from Transformers -- a landmark language model from Google (2018) that learns deep bidirectional representations of text.

The Breakthrough

Before BERT, language models processed text in one direction (left-to-right or right-to-left). BERT reads in both directions simultaneously using masked language modeling, where random words are hidden and the model must predict them from surrounding context.

Architecture

BERT is an encoder-only transformer. BERT-Base has 110M parameters (12 layers), BERT-Large has 340M parameters (24 layers). It produces contextual embeddings where the same word gets different representations based on its context.

Legacy

BERT transformed NLP by establishing the pretrain-then-fine-tune paradigm. Variants include RoBERTa (optimized training), DistilBERT (smaller), and domain-specific versions like BioBERT, SciBERT, and LegalBERT.

← Back to AI Glossary

Last updated: March 5, 2026