AI Glossary

Softmax

A mathematical function that converts a vector of raw scores (logits) into a probability distribution, where all values are between 0 and 1 and sum to 1.

The Formula

softmax(x_i) = exp(x_i) / sum(exp(x_j)) for all j. It exponentiates each value (making them positive) and normalizes by the sum (making them sum to 1). Larger input values get disproportionately larger probabilities.

Where It's Used

In the output layer of classification models to produce class probabilities. In the attention mechanism of transformers to compute attention weights. Everywhere a probability distribution over discrete options is needed.

Temperature Scaling

Dividing logits by a temperature parameter T before softmax controls 'sharpness': T<1 makes the distribution peakier (more confident), T>1 makes it flatter (more random). This is how temperature controls LLM generation creativity.

← Back to AI Glossary

Softmax

The Formula

Where It's Used

Temperature Scaling

Related Articles

Activation Functions: ReLU, Sigmoid, Tanh, and More

The Attention Mechanism: How AI Learned to Focus

Flash Attention: Making Transformers 5x Faster

Related Concepts