Lemmatization · Suman Bhadra Notes

The dictionary-smart way to find a root

Like stemming, lemmatization reduces a word to a base form — but it does so using grammar and a real dictionary, so the result (the lemma) is always a genuine word.

"Better" lemmatizes to "good". "Mice" to "mouse". "Was", "is", "are" all to "be". A stemmer could never do this — it only chops letters; a lemmatizer actually looks the word up.

It needs context

The lemma depends on the word's part of speech. "Meeting" as a noun stays "meeting"; as a verb it becomes "meet". So good lemmatization pairs with POS tagging.

Lemmas vs stems

Watch words map to real dictionary lemmas, including the irregular forms a stemmer can't handle — then a side-by-side with stemming.

How it works

1. Tag the POS noun? verb?

Determine the word's grammatical role — it changes the lemma.

2. Look it up WordNet

Consult a lexical database that maps every form to its canonical lemma.

3. Return the lemma a real word

Including irregulars: went → go, feet → foot, better → good.

Trade-offs

Upside

Output is always a real, readable word
Handles irregular forms correctly
More accurate normalization

Downside

Slower — needs lookups and POS tagging
Depends on a language resource (WordNet)
POS errors → wrong lemma

Which to use?

See Stemming vs Lemmatization for the head-to-head and a decision rule.