Linear Regression
What it is
Linear regression is a simple way to spot a trend in your data and use it to make predictions.
Suppose you have a list of students with the hours each one studied and the exam score they got. Plot those pairs on a chart and you'll usually notice a trend: more study tends to mean a higher score. Linear regression captures that trend by drawing a single straight line through the cloud of points — and once you have the line, you can read off a predicted score for any new student just from how long they studied.
Given pairs of numbers that move together, linear regression finds the straight-line rule that best turns one number into the other.
With one input, the model is just a line — described by two numbers:
How much the prediction changes when the input goes up by one.
The prediction when the input is zero — the line's height at the y-axis.
Plug a new x in, get the predicted y out.
One input is simple linear regression. Several inputs — for example, predicting rent from square footage and bedroom count — is multiple linear regression. Same idea, just more knobs to tune.
Where it's used
Anywhere a "more of X tends to mean more (or less) of Y" relationship shows up — across science, business, and engineering.
Predict monthly rent from apartment size.
Forecast revenue from advertising budget.
Estimate exam score from hours studied.
Estimate the trend of blood pressure with age.
This page uses a 5-student dataset of hours studied vs. exam score — small enough to follow point by point, real enough to make the line meaningful.
What "best" means
For any candidate line, the vertical gap from each point to the line is called a residual — it's how wrong the line is for that observation.
The least-squares line is the one with the smallest total of squared residuals. Out of every possible line, exactly one wins.
Two common approaches, both ending at the same line:
- A direct formula — solves for the best line in one shot. Great for small datasets.
- Gradient descent — start with any line and keep nudging it in the direction that lowers the error. Scales to huge datasets.
Watch the construction
The animation builds the least-squares line on a tiny dataset of study hours vs. exam scores. The line starts flat, residuals turn into literal red squares, then the line tilts to shrink them.
Why we square the gaps
Without squaring, points above and below the line cancel out. A bad line could net zero by accident.
A residual of 4 contributes 16. A residual of 1 contributes 1. So the fit works hardest to avoid large misses.
Reading the result
For our study-hours dataset, the best-fit line lands at:
y = 6.4 · x + 45.4
Each extra hour of study is worth about 6.4 more exam points.
The score the model predicts for zero hours of study.
Plug it into the equation. For 6 hours: 6.4 × 6 + 45.4 ≈ 83.8.
Evaluation — how good is the fit?
Once we have a line, we want a number that tells us how well it summarises the data. Two simple ones cover most situations.
Roughly how far off the line's predictions are, in the same units as y. Smaller is better.
How much of the pattern the line captures. 1 = perfect, 0 = no better than guessing the average.
R² = 0.996 — the line captures nearly all the pattern. RMSE ≈ 0.57 exam points — predictions are off by roughly half a point on average.
Assumptions
Linear regression works best when these three conditions are roughly true.
The trend between x and y looks roughly like a straight line, not a curve.
Each data point stands on its own — one point's value doesn't tell you about the next.
Points scatter around the line by about the same amount everywhere — no fan or funnel shape.
When it works — and when it doesn't
- The relationship is roughly straight
- The spread around the line is fairly even
- No single point is wildly out of line
- The data curves — no straight line fits
- Outliers exist — one extreme point yanks the whole line
- Many drivers — extend to multiple linear regression