I am trying to find out how exactly bayesian filtering works but I am not clear on it.
My question is how bayesian filtering actualy works i-e it collects the words from whole message (header, body & subject) or just body & subject or just from body.
In my scenario users forward me those spam messages (ham) which are incorrectly marked as spam and those ham messages (spam) which are actualy spam.
My problem in this way is that orignal messages are now modified and extra headers and signatures are also added with them and I am curious that this will train the filter in wrong manner.
Bayesian filtering question.
Bayesian filtering question.
Azfar Hashmi
Email : azfarhashmi@hotmail.com
Email : azfarhashmi@hotmail.com
readI am trying to find out how exactly bayesian filtering works but I am not clear on it.
http://www.paulgraham.com/spam.html
http://www.paulgraham.com/better.html
a better way to solve this problem, if your users use imap, is to create two folders for each user called "for-spam" or "for-ham"; have them copy the incorrectly marked messages into the appropriate folder. then, every night, run a script that runs your filter against their messages.In my scenario users forward me those spam messages (ham) which are incorrectly marked as spam and those ham messages (spam) which are actualy spam.
it's not something you need to worry about, if you have lots of spam/ham messages.My problem in this way is that orignal messages are now modified and extra headers and signatures are also added with them and I am curious that this will train the filter in wrong manner.
Watch out for the Manners Taliban!
Isn't it amazing how so many people can type "linuxpakistan.net" into their browsers but not "google.com"?
Isn't it amazing how so many people can type "linuxpakistan.net" into their browsers but not "google.com"?
thanks for the information.
Here what i got and its very simple.
Here what i got and its very simple.
Code: Select all
What Bayesian Spam Filters Look At?
When doing their scans and evaluations, Bayesian spam filters look at many parts of an email. Here is what they examine:
Words in the body of the message
Headers of the message (including the senders and message paths)
Aspects of the HTML code (such as the colors, for example)
Word pairs and phrases (ones that are commonly used by spammers are searched for)
Meta information (where a specific phrase appears, for instance)
When an email arrives, it is scanned by the Bayesian spam filter. All of these characteristics are looked at, and the probability of the message being spam is calculated
Azfar Hashmi
Email : azfarhashmi@hotmail.com
Email : azfarhashmi@hotmail.com