Early Stopping · Suman Bhadra Notes

Know when to quit

More training isn't always better. Past a point, a network stops learning the pattern and starts memorizing the noise — and the only way to see it is the validation loss.

Early stopping watches that validation loss and halts training when it stops improving, then rewinds to the best weights. Simple, cheap, and one of the most effective regularizers there is.

Watch the curves diverge

Training loss falls forever; validation loss bottoms out then turns back up. Early stopping marks that turning point as the stopping epoch.

How it works in practice

Monitor validation loss

After each epoch, evaluate on a held-out validation set.

Patience wait N epochs

Don't stop at the first uptick — wait patience epochs with no improvement to ride out noise.

Restore best rewind weights

Save the weights at the best validation score and roll back to them when you stop.

Why it's so useful

Pros

Prevents overfitting for free
Saves compute — stop early
No extra hyperparameters in the model

Keep in mind

Needs a validation set held out
Set patience sensibly for noisy curves
Combine with dropout / weight decay for best results