AI Glossary

Cross-Entropy Loss

The most common loss function for classification tasks, measuring the difference between predicted probability distributions and true labels.

How It Works

For each example, cross-entropy computes -log(predicted probability of the correct class). Penalizes confident wrong predictions heavily. A perfect prediction (probability 1.0 for correct class) gives loss of 0. Used for both binary and multi-class classification.

In Language Models

LLMs are trained with cross-entropy loss over the vocabulary. For each token position, the model predicts a probability distribution over all possible next tokens, and loss measures how well this matches the actual next token.

Variants

Binary cross-entropy: For two-class problems. Categorical cross-entropy: For multi-class with one-hot labels. Focal loss: Down-weights easy examples to focus on hard ones. Label smoothing: Softens targets to prevent overconfidence.

← Back to AI Glossary

Last updated: March 5, 2026