GloVe — Global Vectors
Embeddings from global counts
GloVe (Stanford, 2014) is another way to learn word embeddings. Where Word2Vec learns from local prediction windows, GloVe learns from global co-occurrence statistics — how often every pair of words appears together across the entire corpus.
The name says it: Global Vectors. Build one big word-by-word co-occurrence matrix, then factorize it into compact vectors whose dot products match the (log) co-occurrence counts.
Counts → vectors
Watch a tiny co-occurrence matrix get built from a corpus, then factorized into word vectors that land similar words together.
The key idea — ratios carry meaning
GloVe's insight: it's the ratio of co-occurrence probabilities that encodes meaning. "ice" co-occurs with "solid" far more than "steam" does; "steam" with "gas". GloVe trains vectors so their differences capture exactly these ratios.
GloVe vs Word2Vec
- Uses global corpus statistics at once
- Matrix factorization on co-occurrences
- Efficient to train on huge counts
- Uses local sliding windows
- Predictive shallow network
- Streams example by example
Both produce similar-quality static embeddings, and pretrained GloVe vectors (trained on Wikipedia + Gigaword, or Common Crawl) are a popular drop-in. Both share the same limitation — one vector per word — that contextual transformer models later solved.