Dropout

Deep Learning regularization overfitting robustness

Randomly turn neurons off

Dropout is a strikingly simple anti-overfitting trick: on each training step, randomly switch off a fraction of neurons (say 20–50%). The network must keep working without them.

Why it helps: neurons can't rely on any one buddy always being there (called co-adaptation), so each learns to be useful on its own. The result is redundant, robust features — a network that generalizes better.

Watch neurons flicker

During training, a different random subset of hidden neurons drops out each step. At test time, every neuron is back, with weights scaled to compensate.

Train vs test

During training
  • Drop each neuron with probability p
  • A different random subnetwork each step
  • Forces redundancy — no single point of failure
At test time
  • No dropout — all neurons active
  • Scale activations so the expected output matches training
  • Effectively averages many subnetworks (an ensemble)
It's a cheap ensemble

Each training step trains a different thinned network; test time averages them all. You get ensemble-like robustness from a single model — almost for free.

Using it well

Rate 0.2 – 0.5

Higher for big, overfitting-prone layers; lower (or none) for small ones.

Where dense layers

Common after fully-connected layers; used carefully in CNNs and with care alongside BatchNorm.

When if overfitting

Add it when train accuracy ≫ validation accuracy. It's one tool among several.