AI Glossary

Cross-Entropy Loss

The most common loss function for classification tasks, measuring the difference between predicted probability distributions and true labels.

How It Works

For each example, cross-entropy computes -log(predicted probability of the correct class). Penalizes confident wrong predictions heavily. A perfect prediction (probability 1.0 for correct class) gives loss of 0. Used for both binary and multi-class classification.

In Language Models

LLMs are trained with cross-entropy loss over the vocabulary. For each token position, the model predicts a probability distribution over all possible next tokens, and loss measures how well this matches the actual next token.

Variants

Binary cross-entropy: For two-class problems. Categorical cross-entropy: For multi-class with one-hot labels. Focal loss: Down-weights easy examples to focus on hard ones. Label smoothing: Softens targets to prevent overconfidence.

← Back to AI Glossary

Cross-Entropy Loss

How It Works

In Language Models

Variants

Related Articles

Cross-Validation: How to Properly Test Your ML Models

How LLMs Are Trained: From Raw Text to ChatGPT

Related Concepts