Bayesian filtering question.

Protecting your Linux box

Bayesian filtering question.

Postby azfar » Sat Nov 29, 2008 3:01 pm

I am trying to find out how exactly bayesian filtering works but I am not clear on it.

My question is how bayesian filtering actualy works i-e it collects the words from whole message (header, body & subject) or just body & subject or just from body.

In my scenario users forward me those spam messages (ham) which are incorrectly marked as spam and those ham messages (spam) which are actualy spam.

My problem in this way is that orignal messages are now modified and extra headers and signatures are also added with them and I am curious that this will train the filter in wrong manner.
Azfar Hashmi
Email : azfarhashmi@hotmail.com
azfar
Captain
 
Posts: 598
Joined: Tue Mar 23, 2004 1:16 am
WLM: azfarhashmi@hotmail.com
Yahoo Messenger: azfarhusain@yahoo.com
Location: Karachi

Postby lambda » Sat Nov 29, 2008 4:48 pm

I am trying to find out how exactly bayesian filtering works but I am not clear on it.
read

http://www.paulgraham.com/spam.html
http://www.paulgraham.com/better.html

In my scenario users forward me those spam messages (ham) which are incorrectly marked as spam and those ham messages (spam) which are actualy spam.
a better way to solve this problem, if your users use imap, is to create two folders for each user called "for-spam" or "for-ham"; have them copy the incorrectly marked messages into the appropriate folder. then, every night, run a script that runs your filter against their messages.
My problem in this way is that orignal messages are now modified and extra headers and signatures are also added with them and I am curious that this will train the filter in wrong manner.
it's not something you need to worry about, if you have lots of spam/ham messages.
Watch out for the Manners Taliban!
Isn't it amazing how so many people can type "linuxpakistan.net" into their browsers but not "google.com"?
lambda
Major General
 
Posts: 3452
Joined: Tue May 27, 2003 7:04 pm
Website: http://www.hungry.com/~fn/
Location: Lahore

Postby azfar » Sat Nov 29, 2008 9:32 pm

thanks for the information.

Here what i got and its very simple.

Code: Select all

What Bayesian Spam Filters Look At?
When doing their scans and evaluations, Bayesian spam filters look at many parts of an email. Here is what they examine:

Words in the body of the message
Headers of the message (including the senders and message paths)
Aspects of the HTML code (such as the colors, for example)
Word pairs and phrases (ones that are commonly used by spammers are searched for)
Meta information (where a specific phrase appears, for instance)
When an email arrives, it is scanned by the Bayesian spam filter. All of these characteristics are looked at, and the probability of the message being spam is calculated
Azfar Hashmi

Email : azfarhashmi@hotmail.com
azfar
Captain
 
Posts: 598
Joined: Tue Mar 23, 2004 1:16 am
WLM: azfarhashmi@hotmail.com
Yahoo Messenger: azfarhusain@yahoo.com
Location: Karachi


Return to “%s” Security

Who is online

Users browsing this forum: No registered users and 3 guests

cron