Transfer Learning
Stand on the shoulders of a trained model
A ResNet trained on millions of ImageNet photos already learned to detect edges, textures, shapes, and object parts. Why throw that away? Transfer learning reuses those learned features for your own task.
You keep the pretrained feature extractor, replace just the final classifier head with one for your classes, and train on your (often tiny) dataset. The result: strong accuracy from a few hundred images instead of a few million.
Reuse, replace, retrain
Watch a model pretrained on 1000 ImageNet classes get its head swapped for a 2-class cats-vs-dogs task — with the feature layers frozen.
Two ways to do it
Freeze all pretrained layers; train only the new head. Fast, needs little data, ideal when your data is small.
Unfreeze the top few layers too and train them at a low learning rate. More accuracy when you have more data.
Why it works so well
- Great accuracy from small datasets
- Much faster training, less compute
- Low layers transfer across most vision tasks
- Works best when the source domain is similar
- Match the pretrained model's input preprocessing
- Fine-tune gently — a big LR can wreck good features
This is exactly how modern NLP works too: pretrained language models are fine-tuned on your task — see LLM fine-tuning and Hugging Face.