AI Glossary

Cross-Entropy Loss

The most common loss function for classification tasks, measuring the difference between predicted probability distributions and actual labels.

How It Works

Cross-entropy penalizes confident wrong predictions heavily. If the model assigns 99% probability to the wrong class, the loss is very high. If it assigns 99% to the correct class, the loss is near zero. This encourages the model to output well-calibrated probabilities.

Binary vs. Categorical

Binary cross-entropy: For two-class problems. Loss = -[y*log(p) + (1-y)*log(1-p)]. Categorical cross-entropy: For multi-class problems. Loss = -sum(y_i * log(p_i)). Used with softmax output layers.

Connection to Language Models

Language model training is fundamentally a next-token prediction (classification) task. The cross-entropy loss between the model's predicted token distribution and the actual next token is the standard training objective for GPT-style models.

← Back to AI Glossary

Last updated: March 5, 2026