Dropout · Suman Bhadra Notes

Randomly turn neurons off

Dropout is a strikingly simple anti-overfitting trick: on each training step, randomly switch off a fraction of neurons (say 20–50%). The network must keep working without them.

Why it helps: neurons can't rely on any one buddy always being there (called co-adaptation), so each learns to be useful on its own. The result is redundant, robust features — a network that generalizes better.

Watch neurons flicker

During training, a different random subset of hidden neurons drops out each step. At test time, every neuron is back, with weights scaled to compensate.

Train vs test

During training

Drop each neuron with probability p
A different random subnetwork each step
Forces redundancy — no single point of failure

At test time

No dropout — all neurons active
Scale activations so the expected output matches training
Effectively averages many subnetworks (an ensemble)

It's a cheap ensemble

Each training step trains a different thinned network; test time averages them all. You get ensemble-like robustness from a single model — almost for free.

Using it well

Rate 0.2 – 0.5

Higher for big, overfitting-prone layers; lower (or none) for small ones.

Where dense layers

Common after fully-connected layers; used carefully in CNNs and with care alongside BatchNorm.

When if overfitting

Add it when train accuracy ≫ validation accuracy. It's one tool among several.