Bayes' theorem is a fundamental block of probability theory and allows us, in simple terms, to express and update our beliefs given new information.
The formula:
P(A∣B)=P(B)P(B∣A)⋅P(A)
- P(A∣B) is the [[probability]] of event A given event B is true.
- P(B∣A) is the probability of event B given event A is true.
- P(A) and P(B) are the probabilities of events A and B respectively.
One interesting implication of this method is a Sentiment Analysis. That specific use-case of the Bayes' theorem is called Naive Bayes method. Let's find why.
For this problem, I need a dataset. For example, IMDB 50K as I mentioned in my previous post, or create your own using text samples with sentiment labels, such as "positive," "negative," or "neutral".
The algorithm:
- Calculate initial probabilities, based on sentiments distribution (P(positive), P(negative), P(neutral)).
- Then, for each word, calculate sentiment probability based on its occurrence in text snippets (P(love∣positive), P(terrible∣negative), etc.).
- Based on that information we can now define a posterior classifier, that will update a sentiment probability:
P(positive∣text)=P(text)P(love∣positive)⋅P(terrible∣postive)⋅…⋅P(positive)
Then the same for neutral and negative.
- For calculating P(text) using the law of total probability:
P(text)=P(text∣positive)⋅P(positive)+P(text∣negative)⋅P(negative)+P(text∣neutral)⋅P(neutral)
For calculating P(text∣positive/negative/neutral) using a simplification, called bag of words, where we basically assuming that all words in the sentence are independent, ant their only feature is frequency:
P(text∣positive)=P(love∣positive)⋅P(this∣positive)⋅P(weather∣positive)
And the same for the rest of the sentiment labels.
The bag of words simplification is exactly why this method is called "naive." It might seem like a shallow assumption, but it turns out to be extremely efficient in practice.