AI Glossary

Causal Language Model

A language model that generates text left-to-right, predicting each token based only on the tokens that came before it (never looking ahead).

Architecture

Uses a decoder-only transformer with causal (masked) self-attention. Each position can only attend to previous positions, enforcing the autoregressive property. GPT, LLaMA, Claude, and Gemini are all causal language models.

Training Objective

Trained on next-token prediction: given a sequence of tokens, predict the probability distribution over possible next tokens. The loss is cross-entropy between predicted and actual next tokens.

Why Causal?

The causal constraint ensures the model can generate text token by token at inference time. It also makes training efficient: a single forward pass through a sequence provides training signal at every position simultaneously.

← Back to AI Glossary

Last updated: March 5, 2026