Cross-Entropy Loss
The most common loss function for classification tasks, measuring the difference between predicted probability distributions and actual labels.
How It Works
Cross-entropy penalizes confident wrong predictions heavily. If the model assigns 99% probability to the wrong class, the loss is very high. If it assigns 99% to the correct class, the loss is near zero. This encourages the model to output well-calibrated probabilities.
Binary vs. Categorical
Binary cross-entropy: For two-class problems. Loss = -[y*log(p) + (1-y)*log(1-p)]. Categorical cross-entropy: For multi-class problems. Loss = -sum(y_i * log(p_i)). Used with softmax output layers.
Connection to Language Models
Language model training is fundamentally a next-token prediction (classification) task. The cross-entropy loss between the model's predicted token distribution and the actual next token is the standard training objective for GPT-style models.