Classic CNN Architectures — LeNet to ResNet

A history of going deeper

The story of computer vision is the story of CNNs getting deeper — and figuring out how to train them. Four landmark architectures mark the journey.

The timeline

Watch the networks grow from LeNet's handful of layers to ResNet's 150+ — and see the skip connection that finally made extreme depth trainable.

The landmarks

LeNet-5 (1998) ~7 layers

Yann LeCun's digit reader for cheques. Proved conv-pool-dense works — decades ahead of its time.

AlexNet (2012) 8 layers

Crushed ImageNet, kicking off the deep-learning boom. ReLU, dropout, and GPUs at scale.

VGG (2014) 16–19 layers

Showed that stacking many small 3×3 convs goes deep cleanly. Simple, uniform, very heavy.

ResNet (2015) 50–152 layers

Skip connections let gradients flow through 100+ layers. Won ImageNet and changed everything.

The key innovation: skip connections

Residual learning

Before ResNet, stacking more layers actually made training error worse — a degradation problem where deeper plain nets were harder to optimize, not simply vanishing gradients (those had largely been tamed by batch normalization). ResNet adds a shortcut that lets the input skip past a block: output = F(x) + x. The layer only needs to learn the residual, and gradients get a direct highway backward. Suddenly 150 layers trained fine. Full story: residual connections.

Lasting impact

Skip connections are now everywhere — including the transformers behind modern LLMs. It's one of deep learning's most important ideas.