Types of Machine Learning · Suman Bhadra Notes

One question splits them all

What separates the families of machine learning is simple: what feedback does the model get while it learns?

Does it see the right answer for every example? Does it see no answers at all and have to find structure itself? Or does it learn by trial and error, nudged by rewards? Those three answers give the three classic families — supervised, unsupervised, and reinforcement learning.

The mental model

Supervised = learning with an answer key. Unsupervised = learning with no answer key. Reinforcement = learning from rewards and penalties.

See all three

The animation steps through each family with a tiny illustration: labelled points getting a boundary, unlabelled points falling into clusters, and an agent finding its way to a reward.

Supervised learning — learn from labelled examples

Every training example comes with the correct answer attached. The model's job is to learn the mapping from input to answer so it can predict the answer for new inputs. Two flavours:

Regression predict a number

House price, temperature, exam score. The output is continuous. See Linear Regression.

Classification predict a category

Spam / not spam, cat / dog, disease / healthy. The output is a label. See Logistic Regression.

Needs

A labelled dataset — often the expensive part. Someone has to mark thousands of examples with the right answer.

Unsupervised learning — find structure with no labels

Here the data has no answers attached. The model looks for patterns, groupings, or a more compact representation on its own.

Clustering group similar items

Segment customers, group news articles by topic. See K-Means Clustering.

Dimensionality reduction compress features

Squeeze many features into a few while keeping the signal. See PCA.

Anomaly detection spot the odd one

Flag fraud or faulty sensors as points that don't fit any pattern.

Reinforcement learning — learn by reward

An agent takes actions in an environment, receives rewards or penalties, and gradually learns a policy — a strategy that maximises long-term reward. There is no fixed dataset; experience is generated by acting.

Games win / lose

Chess, Go, and video games — the reward is the score or the win.

Robotics reach the goal

Teach a robot to walk or grasp by rewarding progress.

Recommendations engagement

Learn which suggestions keep users coming back.

A worth-knowing middle ground

Semi-supervised few labels, lots of data

A small labelled set guides learning over a much larger unlabelled pile — common when labelling is costly.

Self-supervised labels from the data itself

The model invents its own task — like predicting a hidden word — which is how modern LLMs are pre-trained.

Quick comparison

Pick by your data

Have inputs and answers → supervised
Have inputs but no answers → unsupervised
Have a goal and can act + get reward → reinforcement

Watch out for

Supervised: labels are costly to collect
Unsupervised: results are harder to evaluate
Reinforcement: slow and unstable to train