Mini Project: Sentiment Analysis · Suman Bhadra Notes

Positive or negative?

Sentiment analysis reads a review and decides whether the opinion is positive or negative. It's the second classic text-classification project — and a perfect fit for Naive Bayes.

The intuition is delightfully simple: from labelled reviews, learn how strongly each word leans positive or negative. To score a new review, add up the leanings of its words. Lots of "great", "love", "excellent" → positive; lots of "terrible", "boring", "waste" → negative.

Watch it learn and judge

From labelled reviews, the model learns per-word sentiment log-odds, then scores a fresh review by adding them together.

The pipeline

1. Labelled reviews pos / neg

e.g. the IMDB movie-review dataset, each review tagged with its sentiment.

2. Preprocess + vectorize counts

Clean, tokenize, and turn into word counts (Bag of Words).

3. Train Naive Bayes word likelihoods

Estimate P(word | positive) and P(word | negative) from the counts.

4. Predict multiply & compare

Multiply the word likelihoods for each class; the bigger product wins.

The gotchas of sentiment

Works well on

Clear-cut opinionated text
Large labelled review datasets
As a fast, strong baseline

Trips on

Negation: "not good" — keep bigrams, don't strip "not"
Sarcasm: "oh great, it broke again"
Mixed sentiment in one review

Level up

For tougher cases, move past bag-of-words to embeddings or a fine-tuned transformer like BERT. But Naive Bayes remains a remarkably strong, cheap first cut.