i love spam

Ian Landsman • June 9, 2006

The next version of HelpSpot will include a tier of protections for the portal to protect it from spam in both the forums and request submission page. I've been testing it out this week on the UserScape support portal which was getting loads of generic form spam. I'm happy to report that it works super fantastico! The spam protection has 3 layers.

  1. Link Counts
    Any submission/post that has more than X links in it is autoclassified as spam. This defaults to 4 links by default, but it can be adjusted as needed. This instantly filters out the huge link spams, even if they're brand new and never used before.

  2. Timestamp Forms
    Each HTML form now includes 3 hidden fields. One with a timestamp, one with the IP the form was created for, and one with a secret hash of the two. When the form is submitted if the timestamp or IP is not original (checked via the hash) it's marked as spam. If they are original then the form cannot be older than 2 hours or it's marked as spam.

This works well because most form spam is done by crawling the site once for forms and then just submitting the same form over and over. Now those stored forms will be invalid after a few hours.

  1. Bayesian Filtering
    Finally each post is run through a new set of bayesian filters. These filters learn by manual deletions and also when spam caught in the above two methods is deleted.

Really just the Bayesian filter would be enough, but the above two help to keep spam from ever showing up at all. So if the spammers move over to a new set of words rather than having some spam show initially, the first two filter types help to keep them from ever being displayed.

I'm back to enjoying checking the forums. It's great to see the little spam icon with the number of spams captured.

Now I just want them to keep spamming so the filters can get trained up well and I can check this feature off as real world tested.

Oh and if any customers are having trouble with spam drop me an email and I can send you a link to the beta build of 1.3.5 with the spam protection.

Update: I meant to mention, but forgot to that #2 was found on Keith Devens blog. I had been tinkering with something along the same lines, but his solution was simpler and had the added plus that he'd already verified it worked.