GloVe — Global Vectors

NLP embeddings co-occurrence GloVe

Embeddings from global counts

GloVe (Stanford, 2014) is another way to learn word embeddings. Where Word2Vec learns from local prediction windows, GloVe learns from global co-occurrence statistics — how often every pair of words appears together across the entire corpus.

The name says it: Global Vectors. Build one big word-by-word co-occurrence matrix, then factorize it into compact vectors whose dot products match the (log) co-occurrence counts.

Counts → vectors

Watch a tiny co-occurrence matrix get built from a corpus, then factorized into word vectors that land similar words together.

The key idea — ratios carry meaning

Co-occurrence ratios

GloVe's insight: it's the ratio of co-occurrence probabilities that encodes meaning. "ice" co-occurs with "solid" far more than "steam" does; "steam" with "gas". GloVe trains vectors so their differences capture exactly these ratios.

GloVe vs Word2Vec

GloVe
  • Uses global corpus statistics at once
  • Matrix factorization on co-occurrences
  • Efficient to train on huge counts
Word2Vec
  • Uses local sliding windows
  • Predictive shallow network
  • Streams example by example
In practice

Both produce similar-quality static embeddings, and pretrained GloVe vectors (trained on Wikipedia + Gigaword, or Common Crawl) are a popular drop-in. Both share the same limitation — one vector per word — that contextual transformer models later solved.