Why Convert Text to Numbers?
Models do math, not reading
Under the hood, every ML model is arithmetic — it multiplies inputs by weights and adds them up. You can't multiply the word "great" by 0.7. So before any modeling, text must become numbers.
This step is called vectorization (or feature extraction): turning each document into a vector of numbers that captures something about its content. It's the bridge from the language world into the math world.
Represent a document as a fixed-length list of numbers — in a way that similar documents get similar vectors.
See the bridge
A sentence can't enter a model as-is. The animation shows why, then the basic recipe: build a vocabulary, map words to positions, and emit a number vector.
What makes a good representation?
Every document → a vector of the same dimension, whatever its length.
The numbers should reflect what words appear (and ideally, what they mean).
Documents about the same topic should land near each other in vector space.
The methods in this track
Each method fixes a weakness of the last — from sparse one-hot, to counts, to weighted counts, to dense learned meaning.