John Battelle’s SearchBlog, which is hosted by Birdhouse, has been undergoing a constant (and brutal) deluge of weblog comment spam over the past few days. It’s always been bad for him, but I’ve never seen anything like this. Akismet is still the bomb, but even the mighty Akismet couldn’t stay out in front of this wave. Since Akismet only knows about spam that’s been submitted to it by the hive mind, the first blogs to receive a new wave of spam are unprotected by it.
The script I wrote a while ago to query blog databases for spammy behavior and shunt IP addresses into the firewall works wonderfully when IP addresses are legitimate, but it seems that most spammers know how to fake their IPs these days, rendering it ineffective.
Ever wondered what a comment spam blitzkrieg looks like from a server load perspective? Take a look at the load average graph from today (snapshot every 6 minutes):
Those spikes, some representing fairly long blocks of time, represent thousands of bogus comments being submitted into battellemedia.com simultaneously. For reference, load averages shouldn’t spike above 1.0 too often, or things get uncomfortable. This is why spammers – especially weblog comment spammers — make me insane.
Decided Battelle needed a second line of defense. We were reluctant about using a captcha for the usual accessibility reasons, so I went looking for a good Turing test system and found TinyTuring by Kevin Shay. As human detectors go, it doesn’t get much simpler than this – requires comments to enter just a single randomly selected letter. A hidden salt prevents algorithmic detection. Required modification of three MT templates. So far, 100% effective. Yes, armies of underpaid Malaysian human spambots can still jam crap into the system manually, but those comments will still have Akismet to deal with.
The cat and mouse game continues.
3 Replies to “TinyTuring”
totally fascinating. -Joe
Bleah. Well… glad at least to hear it’s not just me.
What’s odd to me, and perhaps you can shed some light on this, is that most of the spam seems to hit one single entry from a few months ago. Nothing different about this article over all the others. No URL’s, no mail addresses. I’d link to it but I’m afraid that’s what started all the attention to Fresh Boiled Hand.
Thanks for all your hard work keeping this problem under control.
Arikasi – A lot of those are “probes” — they’re testing to see whether they can get a comment through into the system, and if so, how long it lasts. If you look at your referrer lots (or use a WP plugin like SiteMeter), you’ll find a surprising number of searches for “viagra” etc. on your blog. Those are done by the spammers, who are returning to measure their efficacy.