In order to respond to Birdhouse customers who want an answer to the question: “Why are you enforcing comment registration on Movable Type weblogs? Have you really exhausted all other options?,” I’ve put together this Brief History of Our Battle With Comment Spammers to summarize what we’ve done in the past, why it hasn’t worked, and why we think comment registration is our only remaining recourse.
From the perspective of a web host with a dozen customers running MT weblogs, I can confirm what many hosts have reported before: At the server level, massive comment spam blitzriegs are effectively denial-of-service attacks. Every comment submission is a CGI request, and each CGI request needs to look through a large database (MT-Blacklist) of known spammy strings. Multiply this process by a few hundred simultaneous spam submission attempts and the situation isn’t pretty. Watching our available CPU drop to 0 during comment floods has become a too-regular occurrence. While the flood is in progress, mail services stop responding, we can barely type commands into a shell, and even static web requests grind to a stand-still. These attacks sometimes last 30-60 minutes or more, and deny even simple services to paying customers on the same shared host, whether they themselves run MT or not. Everyone on the box pays the price.
Not wanting to shut down MT users (because we love MT!), we’ve tried a lot of techniques to try and mitigate the problem. Here are some of the tricks we’ve tried over the past 18 months, and the reasons why we’ve finally decided to go with forced comment registration for our MT users.
Naturally we tried the old “rename the comment script trick,” but this is only effective for a couple of weeks at best, until the spammers find the new comment script. Later, to try and resuscitate the server when under heavy attack, I wrote a script that renames the comment submission scripts out from under the spambots, for multiple simultaneous blogs. Comment submission attempts then return apache 404s, rather than CPU-intensive database and filesystem requests (of course, legitimate comments also fail during this period). When the onslaught is over, we run the script in reverse to restore the comment and trackback script filenames. This of course is not a solution — only a tool to rescue a choking server. Discussion in the MT forums on this.
We of course have installed MT-Blacklist on all MT blogs. While MT-Blacklist is very effective at keeping spam off of web pages, it hasn’t helped keep the server responsive while under comment spam attacks. It also doesn’t protect against strings that aren’t already blacklisted (moderation mode does keep comments from appearing on pages, but still allows incoming comments to consume server resources). And the blacklist — 99% of which is irrelevant after the first week or so that a spam string has been spotted in the wild — only grows with time, making the database unwieldy and difficult to manage. At least MT-Blacklist prevents pages from being rebuilt as submissions roll in.
A few weeks ago, we installed the MT-DSBL plugin in our most heavily-trafficked MT weblogs. This plugin checks incoming comments to see whether they’ve been sent via an open proxy; if so, the comments are rejected. According to SixApart, this technique has been very effective on the TypePad servers. But MT-DSBL is only helpful in countering spams sent via proxies, and even then, only proxies that are already blacklisted. MT-DSBL may be effective in those conditions, but our server got severely hammered just days after installing it on our most affected blogs. MT-DSBL is a cork in the levy, not a fix.
Naturally, we’ve installed upgrades and patches released by SixApart as they were released. Some of them, such as 3.14, promised to fix bugs that caused unnecessary delays in comment form processing. While we’ve never run metrics to determine actual post response times, we have noticed no improvement to the recurring “out of server resources” situation with any patch released by SixApart.
There are of course many other techniques hosts can use to protect their resources (see SixApart’s guide to dealing with comment spam), all with pluses and minuses. For example, a Captchas system seems like a great solution on the surface, but has an unacceptable accessibility downside. MT’s built-in comment throttling seems promising at first, but is actually nearly useless, since it only tracks comment floods from individual IPs, while spammers use fake/rotating IPs. The 3rd-party Real Comment Throttling plugin improves on the situation by aggregating comments from all IPs per time-slice, but has a fatal flaw: If it blocks 100 spams in an hour and trips the configured threshold, then all legit comments will be blocked until the time-slice ends.
Another suggestion has been to use the apache mod_security module in conjunction with the MT-Blacklist database, so that blocking is done at the apache level and crippling floods of CGI requests are never made. Nice idea (and would work for non-MT blogs on the same host without additional work), but requires placing hefty blacklists into apache’s memory space, and, like MT-Blacklist itself, won’t catch anything that isn’t already blacklisted. Keeping it up to date would require parsing data from each of our customer’s blacklist databases, writing a script to prevent all the duplicates that would occur in such an aggregation, risk blocking legit strings accidentally (one customer’s junk might be another’s desirable strings), etc.
The paradox of such a solution is this: If you’re under severe attack, it’s most likely from spams that include strings that aren’t yet blacklisted. Unless you’re at the helm of the server at all times, you don’t find out until the attack is well under-way. And if you then try to add new strings to the mod_security database, you’ll pull your hair out trying to even type into a shell (because server resources are totally consumed by the attack), let alone do the kind of administration you’d need to adequately respond to the situation. Blacklisting by content is ultimately a fool’s errand.
There are dozens of approaches to the comment spam problem, and we could spend days installing and testing them all. And in the end, we’d still get hammered and the rest of our customers would still suffer the consequences. Meanwhile, the TypeKey service is at the top of SixApart’s list of recommendations for dealing with the problem and is trivially easy to implement, so that’s what we’re doing. In retrospect, we’re already wondering why we waited so long to do it.
But can’t a spammer abuse the TypeKey system? Can’t spammers still submit spams manually? Yes on both counts, but they wouldn’t last long, since SixApart has the ability to close down abusive accounts, thus protecting all TypeKey users everywhere.
Why exactly is forced comment registration seen as an unacceptable stumbling block by many weblog owners? There’s hardly a bulletin board or forum left on the internet that doesn’t require users to register before posting. People are accustomed to it at this point, and the reasons for doing so are well known. So why do we tend to see blogs as having different rules of engagement than bulletin boards? Personally, I think it comes down to ego. Comments on one’s blog are validation that one’s thoughts are interesting to others, i.e. that the blogging effort is worthwhile. So any change that discourages casual commenting mean some of that validation is going to go away.
But forced comment registration needn’t be a barrier to interaction. The trick is in helping readers understand how easy and non-invasive the process really is. If anything, the centralized TypeKey system makes things easier for weblog commenters than for bulletin board commenters, because they get to register once and use that login on every MT weblog that uses TypeKey. This is in stark contrast to bulletin boards and forums where users have to create a new login at each site. The barrier to participation is actually quite low, and the perception to the contrary is puzzling. TypeKey even allows anonymous commenting for those who require it.
But even if the reality is that comment registration is a barrier to casual commenting, we as web hosts are out of options. Protecting server resources for all users in a shared hosting environment is our paramount concern. Drastic circumstances require drastic responses, and comment spammers have a drastic impact on resource availability. And put simply, we’re sick of dealing with it. Spammers consume not only server resources, but untold amounts of time for both sysadmins and bloggers. We’re not willing to deal with it anymore. We need a solution that doesn’t require constant maintenance, that is close to fool-proof, and that doesn’t have nasty side-effects. Comment registration is it.