Dropout
Randomly turn neurons off
Dropout is a strikingly simple anti-overfitting trick: on each training step, randomly switch off a fraction of neurons (say 20–50%). The network must keep working without them.
Why it helps: neurons can't rely on any one buddy always being there (called co-adaptation), so each learns to be useful on its own. The result is redundant, robust features — a network that generalizes better.
Watch neurons flicker
During training, a different random subset of hidden neurons drops out each step. At test time, every neuron is back, with weights scaled to compensate.
Train vs test
- Drop each neuron with probability p
- A different random subnetwork each step
- Forces redundancy — no single point of failure
- No dropout — all neurons active
- Scale activations so the expected output matches training
- Effectively averages many subnetworks (an ensemble)
Each training step trains a different thinned network; test time averages them all. You get ensemble-like robustness from a single model — almost for free.
Using it well
Higher for big, overfitting-prone layers; lower (or none) for small ones.
Common after fully-connected layers; used carefully in CNNs and with care alongside BatchNorm.