Mini Project: Sentiment Analysis
Positive or negative?
Sentiment analysis reads a review and decides whether the opinion is positive or negative. It's the second classic text-classification project — and a perfect fit for Naive Bayes.
The intuition is delightfully simple: from labelled reviews, learn how strongly each word leans positive or negative. To score a new review, multiply together the leanings of its words. Lots of "great", "love", "excellent" → positive; lots of "terrible", "boring", "waste" → negative.
Watch it learn and judge
From labelled reviews, the model learns per-word sentiment odds, then scores a fresh review by multiplying them together.
The pipeline
e.g. the IMDB movie-review dataset, each review tagged with its sentiment.
Estimate P(word | positive) and P(word | negative) from the counts.
Multiply the word likelihoods for each class; the bigger product wins.
The gotchas of sentiment
- Clear-cut opinionated text
- Large labelled review datasets
- As a fast, strong baseline
- Negation: "not good" — keep bigrams, don't strip "not"
- Sarcasm: "oh great, it broke again"
- Mixed sentiment in one review
For tougher cases, move past bag-of-words to embeddings or a fine-tuned transformer like BERT. But Naive Bayes remains a remarkably strong, cheap first cut.