Description: When applied to spam, this probability theory kicks out real junk mail and is less likely to create false positives.
The first generation of spam filters was pretty crude. It not only missed a lot of spam, but it misidentified real messages as spam. The new generation of spam filters is much more flexible, much more likely to catch true spam and much less likely to indicate a false positive, flagging a message as spam that's a real message.
Most spam filters actually use what's called Bayes filtering, which relies on Bayes Theorem. Dr. Thomas Bayes was a 19th century mathematician. In his theorem, well, it goes something like this. The probability, just kidding, let's not do the math. Think about this. Bayes Theorem basically says if you have an event and you want to know the likelihood of that event happening, you can determine a probability by looking at a subset of randomly selected parts of that group.
Think about an election. Let's say you've got a group of voters here and they're going to vote on a referendum. They have to vote either yes or no and you want to know what is the likelihood that there's going to be a yes vote. If you take a randomly selected subgroup and you learn the distribution of yes to no in this group, let's say in this case 7 yes, 3 no, you can use Bayes Theorem to predict what's the likelihood of a yes vote from the larger group. I think the math works out in that case of 7 to 3 distribution that you have an 89% probability that the vote is going to be yes. So that's how Bayes Theorem works in an election.
Now with spam filtering, it's basically the same kind of problem. You want to know yes or no for spam within this whole universe of your e-mail inbox. So Bayes filters take a subset of that and they test for different conditions. They look at an individual work and say, what is the probability that that word is spam from looking at the subset. So look at combinations of words and a probability to the combination. They look at colors in an HTML e-mail. What's the likelihood that a particular color in an HTML e-mail indicates spam or what kind of URLs, what kind of links are in there and where the placement of individual words is? All kinds of conditions and they can assign a probability using an individual condition or a combination of any one of these. That allows you to build a filter back here that is very flexible, you can set it to be as restrictive or nonrestrictive as you want. This allows you not only to catch a lot more spam, but a lot less false positives, because you're not just looking at say, you know, in the old age, you'd say, any e-mail that has Viagra in it is automatically spam. Now you can say any e-mail that has the word Viagra you can say by itself 90% likely, but if it says Viagra plus pills, 98% likely, Viagra plus 'buy now' 100% likely.
So by using Bayes filtering you're able to reduce the number of false positives and have a much more efficient spam filter.
Premier Vendor Content Whitepapers, webcasts & resources from our Power Center Sponsors
- FREE Economist Report available at the Collaboration Resource Center.
-
"Collaboration: Transforming the Way Business Works", a new study from the Economist Intelligence Unit. Find this informative report along with free videos podcasts and more, availibe courtesy of Cisco.
- Sign in now to download!




