AI Glossary

Activation Function

A mathematical function applied to a neuron's output that introduces non-linearity, enabling neural networks to learn complex patterns.

Why Non-Linearity Matters

Without activation functions, a neural network would simply be a series of linear transformations, collapsing into a single linear layer regardless of depth. Activation functions like ReLU, sigmoid, and tanh allow networks to approximate any continuous function.

Common Activation Functions

ReLU (Rectified Linear Unit): f(x) = max(0, x). The most widely used activation. Simple, fast, and effective. Sigmoid: Squashes output to (0, 1). Used in binary classification output layers. Tanh: Squashes output to (-1, 1). Centered around zero, often preferred over sigmoid. GELU: Used in modern transformers like BERT and GPT. Smoother than ReLU.

Choosing the Right One

For hidden layers, ReLU or its variants (Leaky ReLU, GELU) are standard defaults. For output layers, the choice depends on the task: sigmoid for binary classification, softmax for multi-class classification, and linear (no activation) for regression.

← Back to AI Glossary

Activation Function

Why Non-Linearity Matters

Common Activation Functions

Choosing the Right One

Related Articles

Activation Functions: ReLU, Sigmoid, Tanh, and More

Neural Network Fundamentals: Layers, Weights, and Biases

Related Concepts