Cosine Similarity
Compare by angle, not length
Once text is a vector — TF-IDF or an embedding — you need a way to ask "how similar are these two?" Cosine similarity answers with the angle between them.
cos(θ) = (A · B) / (‖A‖ ‖B‖) — the dot product divided by the two lengths. It ranges from 1 (same direction) through 0 (perpendicular) to −1 (opposite).
Why the angle and not plain distance? Because document length shouldn't matter — a short and a long article about the same topic point the same way even though one vector is much longer. Cosine ignores magnitude and looks only at direction.
Drag the vector
Move the blue vector and watch the angle to the fixed grey vector — and the cosine similarity — update live. Line them up for ~1; make them perpendicular for ~0.
Tip: drag the blue dot. Same direction → cos ≈ 1; right angle → cos ≈ 0; opposite → cos ≈ −1.
How to read the score
Vectors point the same way — the documents/words share almost the same content.
Perpendicular — no shared direction, little in common.
Rare for word counts (always ≥ 0), but possible for embeddings — opposing meaning.
Where it's used
Rank documents by cosine similarity to a query vector — the core of a vector database.
"More like this" by finding the closest item vectors.
Group near-identical texts or detect duplicates.
Put it to work in the finding similar words/documents mini-project.