Cosine Similarity

NLP similarity embeddings vectors

Compare by angle, not length

Once text is a vector — TF-IDF or an embedding — you need a way to ask "how similar are these two?" Cosine similarity answers with the angle between them.

The formula

cos(θ) = (A · B) / (‖A‖ ‖B‖) — the dot product divided by the two lengths. It ranges from 1 (same direction) through 0 (perpendicular) to −1 (opposite).

Why the angle and not plain distance? Because document length shouldn't matter — a short and a long article about the same topic point the same way even though one vector is much longer. Cosine ignores magnitude and looks only at direction.

Drag the vector

Move the blue vector and watch the angle to the fixed grey vector — and the cosine similarity — update live. Line them up for ~1; make them perpendicular for ~0.

Tip: drag the blue dot. Same direction → cos ≈ 1; right angle → cos ≈ 0; opposite → cos ≈ −1.

How to read the score

cos ≈ 1 very similar

Vectors point the same way — the documents/words share almost the same content.

cos ≈ 0 unrelated

Perpendicular — no shared direction, little in common.

cos < 0 opposite

Rare for word counts (always ≥ 0), but possible for embeddings — opposing meaning.

Where it's used

Semantic search find similar

Rank documents by cosine similarity to a query vector — the core of a vector database.

Recommendations nearest neighbours

"More like this" by finding the closest item vectors.

Clustering & dedup grouping

Group near-identical texts or detect duplicates.

Next

Put it to work in the finding similar words/documents mini-project.