Why Naive Bayes Method Is Naive

2023-09-10

Bayes' theorem is a fundamental block of probability theory and allows us, in simple terms, to express and update our beliefs given new information.

The formula:

P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{{P(B|A) \cdot P(A)}}{{P(B)}}
  • P(AB)P(A|B) is the [[probability]] of event A given event B is true.
  • P(BA)P(B|A) is the probability of event B given event A is true.
  • P(A)P(A) and P(B)P(B) are the probabilities of events A and B respectively.

One interesting implication of this method is a Sentiment Analysis. That specific use-case of the Bayes' theorem is called Naive Bayes method. Let's find why.

For this problem, I need a dataset. For example, IMDB 50K as I mentioned in my previous post, or create your own using text samples with sentiment labels, such as "positive," "negative," or "neutral".

The algorithm:

  1. Calculate initial probabilities, based on sentiments distribution (P(positive)P(positive), P(negative)P(negative), P(neutralP(neutral)).
  2. Then, for each word, calculate sentiment probability based on its occurrence in text snippets (P(lovepositive)P(love|positive), P(terriblenegativeP(terrible|negative), etc.).
  3. Based on that information we can now define a posterior classifier, that will update a sentiment probability:
    P(positivetext)=P(lovepositive)P(terriblepostive)P(positive)P(text)P(positive|text) = \frac{{P(love|positive) \cdot P(terrible|postive) \cdot \ldots \cdot P(positive)}}{{P(text)}}
    Then the same for neutral and negative.
  4. For calculating P(text)P(text) using the law of total probability:
P(text)=P(textpositive)P(positive)+P(textnegative)P(negative)+P(textneutral)P(neutral) P(text)=P(text∣positive)\cdot P(positive)+P(text∣negative) \cdot P(negative)+P(text∣neutral) \cdot P(neutral)

For calculating P(textpositive/negative/neutral)P(text|positive/negative/neutral) using a simplification, called bag of words, where we basically assuming that all words in the sentence are independent, ant their only feature is frequency:

P(textpositive)=P(lovepositive)P(thispositive)P(weatherpositive)P(text∣positive)=P(love∣positive) \cdot P(this∣positive) \cdot P(weather∣positive)

And the same for the rest of the sentiment labels.

The bag of words simplification is exactly why this method is called "naive." It might seem like a shallow assumption, but it turns out to be extremely efficient in practice.

Subscribe for daily updates on software development, productivity, and more.