Naïve Bayesian, updating probability with additional information
Google’s Gmail, like other mail systems,
has a feature of automatically classifying junk mails
and putting them into bin. What is the principle behind it?
Today, we are talking about Naïve Bayesian, a machine learning algorithm
frequently used in classifying junk mails and documents.
David Heckerman majored in Bioinfomatics at Stanford,
and has been working in Microsoft since 1992, studying A.I.
based on probability theory.
One day in 1997, he received a junk mail.
This problem was generally dealt in text-classification.
However, Heckerman looked at this in a different angle.
Then in 1998, he gave an assignment to Mehran Sahami
from Stanford graduate school, who was in the summer internship at that time.
Then, he came up with some excellent results.
Bill Gates was very glad to hear the news.
There were some insights of Hackerman that led him to give Sahami the task.
Heckerman’s team successfully came up with a more precise algorithm,
and applied it in the server of Outlook, Hotmail, and Exchange.
It is used in most junk-filters today, including the Microsoft.
The Naïve Bayesian by Heckerman was mentioned in a book
about pattern recognition in 1970s.
In 1990s, this algorithm gained rapid popularity.
Its basis, the Bayes algorithm, however, has a definite source.
It is from Thomas Bayes.
He was a British Presbyterian minister and a mathematician in the 1700s.
As a pastor, he suggested a method of updating the probability
by accruing new empirical evidence to prove the existence of god.
After Thomas Bayes’s death, his friend Richard Price found this idea
of probability while cleaning up his belongings.
Then, he sent to Royal Society of England,
and brought Thomas Bayes to the world.
As Bayes theorem is a conditional probability,
it is expressed in the following way.
The Naïve Bayesian David Heckerman used is as follows.
Inducted from the Bayes theorem so far.
In other words, we assume that each condition is independent to make calculation easier.
This assumption seems to make no sense (as this case is extremely rare in reality).
However, the Naïve Bayesian still remains useful.
It may seem difficult because it's a formula,
but it can be understood by words.
The more words above, the more likely they are to be spam.