Skip Connection (Residual Connection)
A shortcut that adds a layer's input directly to its output, allowing gradients to flow through deep networks and enabling the training of very deep architectures.
How It Works
Instead of learning a function F(x), the layer learns a residual F(x) + x. The skip connection adds the original input x to the layer's output. If the optimal function is close to identity, the layer only needs to learn a small residual.
Why It's Essential
Without skip connections, deep networks suffer from vanishing gradients (gradients shrink as they propagate backward through many layers). Skip connections provide a direct gradient path, enabling networks with hundreds or thousands of layers.
Ubiquity
Skip connections are used in virtually every modern architecture: ResNet (introduced them), transformers (in every attention and FFN block), U-Net, and DenseNet. They are one of the most important architectural innovations in deep learning.