Activation Function
A mathematical function applied to a neuron's output that introduces non-linearity, enabling neural networks to learn complex patterns.
Why Non-Linearity Matters
Without activation functions, a neural network would simply be a series of linear transformations, collapsing into a single linear layer regardless of depth. Activation functions like ReLU, sigmoid, and tanh allow networks to approximate any continuous function.
Common Activation Functions
ReLU (Rectified Linear Unit): f(x) = max(0, x). The most widely used activation. Simple, fast, and effective. Sigmoid: Squashes output to (0, 1). Used in binary classification output layers. Tanh: Squashes output to (-1, 1). Centered around zero, often preferred over sigmoid. GELU: Used in modern transformers like BERT and GPT. Smoother than ReLU.
Choosing the Right One
For hidden layers, ReLU or its variants (Leaky ReLU, GELU) are standard defaults. For output layers, the choice depends on the task: sigmoid for binary classification, softmax for multi-class classification, and linear (no activation) for regression.