GloVe · Suman Bhadra Notes

Embeddings from global counts

GloVe (Stanford, 2014) is another way to learn word embeddings. Where Word2Vec learns from local prediction windows, GloVe learns from global co-occurrence statistics — how often every pair of words appears together across the entire corpus.

The name says it: Global Vectors. Build one big word-by-word co-occurrence matrix, then factorize it into compact vectors whose dot products match the (log) co-occurrence counts.

Counts → vectors

Watch a tiny co-occurrence matrix get built from a corpus, then factorized into word vectors that land similar words together.

The key idea — ratios carry meaning

Co-occurrence ratios

GloVe's insight: it's the ratio of co-occurrence probabilities that encodes meaning. "ice" co-occurs with "solid" far more than "steam" does; "steam" with "gas". GloVe trains vectors so their differences capture exactly these ratios.

GloVe vs Word2Vec

GloVe

Uses global corpus statistics at once
Matrix factorization on co-occurrences
Efficient to train on huge counts

Word2Vec

Uses local sliding windows
Predictive shallow network
Streams example by example

In practice

Both produce similar-quality static embeddings, and pretrained GloVe vectors (trained on Wikipedia + Gigaword, or Common Crawl) are a popular drop-in. Both share the same limitation — one vector per word — that contextual transformer models later solved.