AI Glossary

BERT

Bidirectional Encoder Representations from Transformers -- a landmark language model from Google (2018) that learns deep bidirectional representations of text.

The Breakthrough

Before BERT, language models processed text in one direction (left-to-right or right-to-left). BERT reads in both directions simultaneously using masked language modeling, where random words are hidden and the model must predict them from surrounding context.

Architecture

BERT is an encoder-only transformer. BERT-Base has 110M parameters (12 layers), BERT-Large has 340M parameters (24 layers). It produces contextual embeddings where the same word gets different representations based on its context.

Legacy

BERT transformed NLP by establishing the pretrain-then-fine-tune paradigm. Variants include RoBERTa (optimized training), DistilBERT (smaller), and domain-specific versions like BioBERT, SciBERT, and LegalBERT.

← Back to AI Glossary

BERT

The Breakthrough

Architecture

Legacy

Related Articles

BERT Explained: Bidirectional Understanding in NLP

Encoder-Only Models: How BERT Changed NLP Forever

Word Embeddings: Word2Vec, GloVe, and the Path to BERT

Topic Modeling: LDA, BERTopic, and Document Clustering

Sentence Embeddings: Measuring Semantic Similarity

Related Concepts