Message Board Spam Filtering?

Forum: ProgrammersTotal Replies: 2
Author Content
coderfrommaine

May 19, 2004
4:50 PM
Hi!

As I might have told you, I am working on building a message board. Something that has been a concern to me has been moderation. I know that this board is moderated in a way that every message posted is looked over by a human being before being “published”. From what I’ve seen it gives awesome results, but I’ve know it must take a lot of time, considering the volume of messages you have on all of your forums.

Jumping a little, I was looking over the father’s forum this afternoon, and I noticed a post about spam. Mr. Maxwell suggested a program called “SpamBayes” ([HYPERLINK@spambayes.sourceforge.net]), so I checked it out. I was amazed! I had never seen such a powerful and effective spam filter. The way it works is by you training it (similar to voice recognition software), and telling it what is good mail (ham) and what is bad mail (spam). If it’s not certain, then it puts it in a suspected folder. You can find out more about how it works at here: [HYPERLINK@spambayes.sourceforge.net]

Now I’ve been thinking, would it be possible to adapt this open-source project to a message board application? The management panel could be divided into folders, with “ham” messages published without review, or with less review. Suspect messages could be reviewed carefully before publishing, and bad (flame, controversial, ect…) posts could be put in a “spam” folder. Users could be looked at as well, ones that output a high amount of “bad” messages could be suspected quicker. Users with a “clean record”, so-to-speak, could be given a little bit of slack.

The method that “SpamBayes” uses seems like it would adapt quite well. Bad words would be learned quickly without having to mess with “blacklists”. It would also pick-up names of controversial people. Who knows how much it might learn?

What do you think?

Alex King
Tex

May 20, 2004
9:21 PM
Hello Alex,

I'm using a very similar software package for spam filtering on my server ([HYPERLINK@spamprobe.sourceforge.net]). In theory, there shouldn't be any reason it won't work. However, there are a couple drawbacks that could make it hard.

1) There aren't all the extra headers in web postings that there are in e-mails. Most spam filters look at these in addition to the body of the message, so they catch anything that might help it decide whether the e-mail is good or bad.
2) The content can be different enough between e-mail filtering and posts in a forum that building a bayesian database of what is good content and what is not would be pretty hard, unless you have a very busy forum. (Even then, you could have quite a bit of work training it on each message until it starts getting enough information to work reliably.)

If you have the time and/or resources to work around those limitations though, I'm sure it could make the moderators job quite a bit easier.

HTH,
Jacob
Tex

May 22, 2004
8:10 AM
One thought I just had for training the spam filter would be having it learn on the archives for your forum (assuming any bad posts have been deleted) for the good content. Then, if you had a large collection of spam e-mail (say, 1 or 2 thousand), you could probably train it on those to get a start for teaching it bad content.

Then it would just be a matter of tuning it as you get more and more posts to the site.

HTH,
Jacob

Posting in this forum is limited to members of the group: SITEADMINS, SUBSCRIBERS, MEMBERS.]

the Open Forums!

  Login
If you don't have an account yet, visit the registration page to sign up.

If you already have an account, you may login here:

Username: Password:

  Welcome to the Open Forums!!
Welcome to the Open Forums!!

  Hosted By...

This website is hosted by:

 -
PreparingSons
 - Titus2.com


[ Copyright © the Open Forums! | All times are recorded in ET ]

[ Contact Us ]

Login

Powered by Scif 5.3 build 285 by StandardOut, Inc.