Reading Training Curves

Deep Learning diagnostics debugging loss curves

A network's vital signs

Plot training and validation loss (and accuracy) against epochs and you get a network's vital signs. The shape of those curves tells you what's wrong and what to change next.

Learning to read them is the fastest debugging skill in deep learning. Four shapes cover most situations.

Four diagnoses

Step through the classic curve shapes — healthy, overfitting, underfitting, and a learning rate that's too high — and what each says to do.

The cheat sheet

Healthy small gap, both ↓

Train and val loss both fall and stay close. Ship it (or train a bit longer).

Overfitting val turns up

Train keeps falling, val rises. → more data, dropout, regularization, early stopping.

Underfitting both high & flat

Both losses plateau high. → bigger model, train longer, better features, less regularization.

LR too high spiky / diverging

Loss jumps around or blows up. → lower the learning rate.

Also worth a glance

Good signs
  • Smooth, steadily decreasing loss
  • Train and val tracking close together
  • Accuracy rising in step with loss falling
Warning signs
  • Val loss well above train loss (overfit)
  • Loss flat from epoch 1 (LR too low / bug)
  • Loss = NaN (LR too high / exploding gradients)