Classic CNN Architectures — LeNet to ResNet
A history of going deeper
The story of computer vision is the story of CNNs getting deeper — and figuring out how to train them. Four landmark architectures mark the journey.
The timeline
Watch the networks grow from LeNet's handful of layers to ResNet's 150+ — and see the skip connection that finally made extreme depth trainable.
The landmarks
Yann LeCun's digit reader for cheques. Proved conv-pool-dense works — decades ahead of its time.
Crushed ImageNet, kicking off the deep-learning boom. ReLU, dropout, and GPUs at scale.
Showed that stacking many small 3×3 convs goes deep cleanly. Simple, uniform, very heavy.
Skip connections let gradients flow through 100+ layers. Won ImageNet and changed everything.
The key innovation: skip connections
Before ResNet, stacking more layers made networks worse — gradients couldn't reach the early layers (vanishing gradients). ResNet adds a shortcut that lets the input skip past a block: output = F(x) + x. The layer only needs to learn the residual, and gradients get a direct highway backward. Suddenly 150 layers trained fine. Full story: residual connections.
Skip connections are now everywhere — including the transformers behind modern LLMs. It's one of deep learning's most important ideas.