John Battelle’s SearchBlog, which is hosted by Birdhouse, has been undergoing a constant (and brutal) deluge of weblog comment spam over the past few days. It’s always been bad for him, but I’ve never seen anything like this. Akismet is still the bomb, but even the mighty Akismet couldn’t stay out in front of this wave. Since Akismet only knows about spam that’s been submitted to it by the hive mind, the first blogs to receive a new wave of spam are unprotected by it.
The script I wrote a while ago to query blog databases for spammy behavior and shunt IP addresses into the firewall works wonderfully when IP addresses are legitimate, but it seems that most spammers know how to fake their IPs these days, rendering it ineffective.
Ever wondered what a comment spam blitzkrieg looks like from a server load perspective? Take a look at the load average graph from today (snapshot every 6 minutes):
Those spikes, some representing fairly long blocks of time, represent thousands of bogus comments being submitted into battellemedia.com simultaneously. For reference, load averages shouldn’t spike above 1.0 too often, or things get uncomfortable. This is why spammers – especially weblog comment spammers — make me insane.
Decided Battelle needed a second line of defense. We were reluctant about using a captcha for the usual accessibility reasons, so I went looking for a good Turing test system and found TinyTuring by Kevin Shay. As human detectors go, it doesn’t get much simpler than this – requires comments to enter just a single randomly selected letter. A hidden salt prevents algorithmic detection. Required modification of three MT templates. So far, 100% effective. Yes, armies of underpaid Malaysian human spambots can still jam crap into the system manually, but those comments will still have Akismet to deal with.
The cat and mouse game continues.