One-Hot Encoding for Text
One slot per word
One-hot encoding gives every word in the vocabulary its own position in a long vector. A word is represented by a 1 in its slot and 0 everywhere else.
If the vocabulary has 10,000 words, every word is a 10,000-long vector with a single 1. Simple, unambiguous — and, as we'll see, deeply wasteful.
See it, and see its flaw
Words become one-hot vectors, a document stacks them up (very sparse), and then the killer limitation: every word is equally far from every other.
The limitations
Vectors as long as the vocabulary (tens of thousands), almost entirely zeros. Wasteful in memory and compute.
"cat" and "dog" are exactly as different as "cat" and "democracy". The encoding knows nothing about meaning.
Position in the vector is just an index — it carries no relationship between words.
A word not in the vocabulary has no slot at all.
So why learn it?
One-hot is the conceptual foundation everything else builds on. Bag of Words is essentially summing one-hots into counts; word embeddings were invented precisely to fix the "no similarity" flaw by giving words dense vectors where related words sit close together.
Embeddings replace a 10,000-long sparse one-hot with, say, a 300-long dense vector — small, and arranged so that meaning lives in geometry.