Lemmatization
The dictionary-smart way to find a root
Like stemming, lemmatization reduces a word to a base form — but it does so using grammar and a real dictionary, so the result (the lemma) is always a genuine word.
"Better" lemmatizes to "good". "Mice" to "mouse". "Was", "is", "are" all to "be". A stemmer could never do this — it only chops letters; a lemmatizer actually looks the word up.
The lemma depends on the word's part of speech. "Meeting" as a noun stays "meeting"; as a verb it becomes "meet". So good lemmatization pairs with POS tagging.
Lemmas vs stems
Watch words map to real dictionary lemmas, including the irregular forms a stemmer can't handle — then a side-by-side with stemming.
How it works
Determine the word's grammatical role — it changes the lemma.
Consult a lexical database that maps every form to its canonical lemma.
Including irregulars: went → go, feet → foot, better → good.
Trade-offs
- Output is always a real, readable word
- Handles irregular forms correctly
- More accurate normalization
- Slower — needs lookups and POS tagging
- Depends on a language resource (WordNet)
- POS errors → wrong lemma
See Stemming vs Lemmatization for the head-to-head and a decision rule.