Field Notes on Comment Registration

In order to respond to Birdhouse customers who want an answer to the question: “Why are you enforcing comment registration on Movable Type weblogs? Have you really exhausted all other options?,” I’ve put together this Brief History of Our Battle With Comment Spammers to summarize what we’ve done in the past, why it hasn’t worked, and why we think comment registration is our only remaining recourse.

From the perspective of a web host with a dozen customers running MT weblogs, I can confirm what many hosts have reported before: At the server level, massive comment spam blitzriegs are effectively denial-of-service attacks. Every comment submission is a CGI request, and each CGI request needs to look through a large database (MT-Blacklist) of known spammy strings. Multiply this process by a few hundred simultaneous spam submission attempts and the situation isn’t pretty. Watching our available CPU drop to 0 during comment floods has become a too-regular occurrence. While the flood is in progress, mail services stop responding, we can barely type commands into a shell, and even static web requests grind to a stand-still. These attacks sometimes last 30-60 minutes or more, and deny even simple services to paying customers on the same shared host, whether they themselves run MT or not. Everyone on the box pays the price.

Not wanting to shut down MT users (because we love MT!), we’ve tried a lot of techniques to try and mitigate the problem. Here are some of the tricks we’ve tried over the past 18 months, and the reasons why we’ve finally decided to go with forced comment registration for our MT users.

Naturally we tried the old “rename the comment script trick,” but this is only effective for a couple of weeks at best, until the spammers find the new comment script. Later, to try and resuscitate the server when under heavy attack, I wrote a script that renames the comment submission scripts out from under the spambots, for multiple simultaneous blogs. Comment submission attempts then return apache 404s, rather than CPU-intensive database and filesystem requests (of course, legitimate comments also fail during this period). When the onslaught is over, we run the script in reverse to restore the comment and trackback script filenames. This of course is not a solution — only a tool to rescue a choking server. Discussion in the MT forums on this.

We of course have installed MT-Blacklist on all MT blogs. While MT-Blacklist is very effective at keeping spam off of web pages, it hasn’t helped keep the server responsive while under comment spam attacks. It also doesn’t protect against strings that aren’t already blacklisted (moderation mode does keep comments from appearing on pages, but still allows incoming comments to consume server resources). And the blacklist — 99% of which is irrelevant after the first week or so that a spam string has been spotted in the wild — only grows with time, making the database unwieldy and difficult to manage. At least MT-Blacklist prevents pages from being rebuilt as submissions roll in.

A few weeks ago, we installed the MT-DSBL plugin in our most heavily-trafficked MT weblogs. This plugin checks incoming comments to see whether they’ve been sent via an open proxy; if so, the comments are rejected. According to SixApart, this technique has been very effective on the TypePad servers. But MT-DSBL is only helpful in countering spams sent via proxies, and even then, only proxies that are already blacklisted. MT-DSBL may be effective in those conditions, but our server got severely hammered just days after installing it on our most affected blogs. MT-DSBL is a cork in the levy, not a fix.

Naturally, we’ve installed upgrades and patches released by SixApart as they were released. Some of them, such as 3.14, promised to fix bugs that caused unnecessary delays in comment form processing. While we’ve never run metrics to determine actual post response times, we have noticed no improvement to the recurring “out of server resources” situation with any patch released by SixApart.

There are of course many other techniques hosts can use to protect their resources (see SixApart’s guide to dealing with comment spam), all with pluses and minuses. For example, a Captchas system seems like a great solution on the surface, but has an unacceptable accessibility downside. MT’s built-in comment throttling seems promising at first, but is actually nearly useless, since it only tracks comment floods from individual IPs, while spammers use fake/rotating IPs. The 3rd-party Real Comment Throttling plugin improves on the situation by aggregating comments from all IPs per time-slice, but has a fatal flaw: If it blocks 100 spams in an hour and trips the configured threshold, then all legit comments will be blocked until the time-slice ends.

Another suggestion has been to use the apache mod_security module in conjunction with the MT-Blacklist database, so that blocking is done at the apache level and crippling floods of CGI requests are never made. Nice idea (and would work for non-MT blogs on the same host without additional work), but requires placing hefty blacklists into apache’s memory space, and, like MT-Blacklist itself, won’t catch anything that isn’t already blacklisted. Keeping it up to date would require parsing data from each of our customer’s blacklist databases, writing a script to prevent all the duplicates that would occur in such an aggregation, risk blocking legit strings accidentally (one customer’s junk might be another’s desirable strings), etc.

The paradox of such a solution is this: If you’re under severe attack, it’s most likely from spams that include strings that aren’t yet blacklisted. Unless you’re at the helm of the server at all times, you don’t find out until the attack is well under-way. And if you then try to add new strings to the mod_security database, you’ll pull your hair out trying to even type into a shell (because server resources are totally consumed by the attack), let alone do the kind of administration you’d need to adequately respond to the situation. Blacklisting by content is ultimately a fool’s errand.

There are dozens of approaches to the comment spam problem, and we could spend days installing and testing them all. And in the end, we’d still get hammered and the rest of our customers would still suffer the consequences. Meanwhile, the TypeKey service is at the top of SixApart’s list of recommendations for dealing with the problem and is trivially easy to implement, so that’s what we’re doing. In retrospect, we’re already wondering why we waited so long to do it.

But can’t a spammer abuse the TypeKey system? Can’t spammers still submit spams manually? Yes on both counts, but they wouldn’t last long, since SixApart has the ability to close down abusive accounts, thus protecting all TypeKey users everywhere.

Why exactly is forced comment registration seen as an unacceptable stumbling block by many weblog owners? There’s hardly a bulletin board or forum left on the internet that doesn’t require users to register before posting. People are accustomed to it at this point, and the reasons for doing so are well known. So why do we tend to see blogs as having different rules of engagement than bulletin boards? Personally, I think it comes down to ego. Comments on one’s blog are validation that one’s thoughts are interesting to others, i.e. that the blogging effort is worthwhile. So any change that discourages casual commenting mean some of that validation is going to go away.

But forced comment registration needn’t be a barrier to interaction. The trick is in helping readers understand how easy and non-invasive the process really is. If anything, the centralized TypeKey system makes things easier for weblog commenters than for bulletin board commenters, because they get to register once and use that login on every MT weblog that uses TypeKey. This is in stark contrast to bulletin boards and forums where users have to create a new login at each site. The barrier to participation is actually quite low, and the perception to the contrary is puzzling. TypeKey even allows anonymous commenting for those who require it.

But even if the reality is that comment registration is a barrier to casual commenting, we as web hosts are out of options. Protecting server resources for all users in a shared hosting environment is our paramount concern. Drastic circumstances require drastic responses, and comment spammers have a drastic impact on resource availability. And put simply, we’re sick of dealing with it. Spammers consume not only server resources, but untold amounts of time for both sysadmins and bloggers. We’re not willing to deal with it anymore. We need a solution that doesn’t require constant maintenance, that is close to fool-proof, and that doesn’t have nasty side-effects. Comment registration is it.

20 Replies to “Field Notes on Comment Registration”

  1. Thanks for the exhaustive download, Scot. I wish we didn’t have to deal with this, but here we are. I don’t think it’s just ego that drives (at least my) desire to not force registration. it’s bascially intent – I hate throwing any friction into a conversation. Fundamentally, I don’t like registration, it gets in the way of a conversation between strangers which can then blossom into a conversation between collegues or friends.

  2. Just an idea: If the MT captcha module doesn’t support audio captchas (for accessibility), perhaps you could enfore captchas, but if the user can’t use the captcha, then they have to fall back to TypePad…

    Again, it’s just a free-floating idea I had, I have no problem with forced registration (nor am I one of your customers… :P)

  3. John,

    Having to register does not create any more friction than exists in the real world.

    Doubt me?

    Publish your cell phone number on your blog and invite anyone that reads it to call you at any hour. Promise to talk to them for as long as they’d like.

    We choose our conservations in meatspace based on our relationship with the person with whom we are conversing. “Do I know you?” is probably something you’ve said in your head a number of times as you had a conversation you didn’t want to have.

    Now Searchblog is saying the same thing. Is that friction? Yes. Is it undue friction? I think no.

  4. John, I hear you and agree (re: adding friction to conversations). Fortunately, the friction (where friction is the necessity of creating the initial TypeKey login) is a one-time deal. After that, authenticating with a blog takes just one click. After a blog owner approves a commenter once, future comments will be accepted automatically. The more sites that use TypeKey, the less friction there will be, since everyone will essentially always be logged in everywhere.

  5. Pingback: cygweb
  6. There is a solution you haven’t tried.

    Place this ban in your .htaccess
    RewriteCond %{HTTP:VIA} ^.+pinappleproxy

    It takes care of the most prolific spammer. They’re responsible for maybe 80 percent or more of comment spam. Pop that rule in, and he gets 403 for every request except those through a high anonymity proxy. And those can be taken out on a per IP basis, they’re that uncommon. The other spammers usually only post up to 3 comments per blog, but this one goes far beyond that.

  7. I feel your pain! I’m a web host and host around 120 installations of MT plus a few each of b2, WP, pMachine and ExpressionEngine. In recent weeks I’ve found trackback spam an even bigger nuisance and even harder to deal with. If I’m lucky I see the attack as it starts and I usually run a find command and chmod all instances of mt-tb.cgi to 000 which stops it in it’s tracks; if I’m unlucky I end up with a hung server that has to be rebooted :( These are happening once or twice a week now. I have managed to log in and do this when the CPU load was in the 20s (1=100%), which gives you some idea of the massive strain the server is under during these attacks.

    I tried the mod_security fix but that introduced memory problems and once the server was rebooted (again!) I had to disable it. I haven’t given up on it yet though, it seems the best of a bad job.

    .htaccess solutions bring their own problems – many of my customers aren’t comfortable implementing solutions like that, which means I have to do it for them. I’m not a massive host by any means but I’m hosting around 200 accounts so updating everyone’s .htaccess file is a big job. I’m not an Apache expert either and it’s never clear whether these fixes can be apllied server-wide via the Apache config file, which would be the most useful solution for a web host.

    As a host I would like to see trackback moderation and trackback throttling which would go some way towards a solution. Trackback authorisation via TypeKey would be nice too. After the most recent attack yesterday I’ve left all the trackback scripts at 000 and asked people not to change them back unless they’re desperate for trackbacks – most of them are happy to do this as they never get a legitimate trackback from one month’s end to the next.

    I’m working on a script at the moment which I hope to be able to run in the background to monitor CPU usage and check for comment/trackback processes if it starts to increase with a view to disabling trackbacks and comments if a spam attack starts. If/when I get it working I’ll pass it on.

  8. Spamhuntress, very interesting tip, thanks. I’ve got it in place now — will be interesting to see how/whether it helps with TrackBack spam (since comment spam is down to virtually zero now :).

    Shelagh, holy mackerel, it sounds like you’ve got even worse problems than I did. I agree that SixApart needs to add some means to moderate TrackBacks, since authenticating them is not possible. So, out of curiosity, are you not considering forcing comment moderation? Any terms of service you have in place that prohibit user software from consuming excessive resources would justify you in doing so…

  9. I’m not considering enforcing comment moderation – yet. Mainly because, at the moment, trackbacks seem to be a bigger problem than comments. I run MT on a couple of my personal sites and my hosting site so I tend to get hit too when everyone else does and I think I get a fair idea how much spam we’re getting hit with across the server as a whole. I also run a b2 blog and I know that when MT blogs get hit with trackback spam my b2 blog gets it too. As a server administrator it’s much easier to spot the MT processes and get a better idea of their impact but I’m sure that the PHP driven blogs are also getting hit and impacting performance, it’s just more difficult to quantify, eg, someone updating their WP blog, commenting on one or adding a trackback to one – it all just shows up as a php process whereas with MT you can identify which script is running.

    I’ve considered running scripts to kill processes using a lot of resource but am reluctant to go down that road. Most of them are extremely heavy-handed and react by freezing all the scripts on a person’s account. MT is known for being very resource hungry but this isn’t usually a problem (and has improved a lot anyway over the last few releases) as while MT does grab a lot of resource it only uses it for a very brief period of time however something that kills a process as soon as it reaches a certain threshold isn’t much use as it will probably result in all MT blogs becoming unuseable. What’s needed is something that can monitor the number of scripts running and kill them under a more restricted set of circumstances.

  10. I’ve installed ModSecurity and used the SpamHuntress snippet in my mod_sec config. In 2 days it’s blocked 28,000+ hits just from this one spammer! If you have this apache module installed add:
    SecFilterSelective “HTTP_Via|HTTP_via” “pinappleproxy” to the config.

  11. Thanks Shelagh. Are you finding that this works for TrackBack spams as well, or just comments? We’re still seeing a fair bit of TrackBack spam, so it wasn’t easy to tell whether this measure had been effective or not.

  12. There’s another little thing you can block, and it seems to block other spammers as well. Not sure exactly how, but I saw it happen after I introduced that X-AAAAAAAAAAAA line. I’ve linked it on my page about the pinappleproxy domains, as a trackback block.

  13. Have you guys tried MT-Keystrokes? It seems too good to be true: link.

    It’s a script that detects if keys are actually being pressed on a keyboard. I installed it on http://www.treehugger.com and it seems to be doing wonders.

    Hopefully it’s just a matter of adoption time, but TypeKey dosn’t just reduce comments, it has practically killed them off. People just hate that extra step of registering.

  14. Yeah – it seems to be working great on Treehugger, but I installed it on http://www.triplepundit.com and all sorts of hell broke loose… it’s so messed up I don’t know what’s causing it (long story short, no comments work now). Anyway, i’m curious to know if you have any problems with KeyStrokes because I really want to use it! TypeKey is just horrible.

  15. Hey Nick – No issues with KeyStrokes so far; working perfectly. Did it ever work on triple pundit, or did it work at first and then break? You might need to restore all of your comment code from MT templates and then re-do it… So far, the effort seems more than worth it, and I can already see that comment rates are back up again.

Leave a Reply

Your email address will not be published. Required fields are marked *