Comment Registration Required

I’ve had it with Movable Type comment spam blitzkriegs dropping available server CPU to 0 and broadsiding web and mail services. Last night we endured a comment spam attack so severe it knocked out the mail server overnight. If you’ve followed this space for a while, you know I’ve tried virtually every trick and upgrade at my disposal to deal with the problem. But it just keeps getting worse.

A few minutes ago, I switched this weblog to a comment-registration-required system. I know this will discourage a percentage (probably a good percentage) of casual comments, and that’s a bummer. But TypeKey registration is trivially easy, and your registration will work at any TypeKey-enabled blog on the internet.

I’ve also just announced the new comment registration policy on status.birdhouse and to the owners of our four most intensive MT users.

My hatred of spammers is boundless and bottomless.

Music: The Roches :: Hammond Song

nofollow

If an href tag includes the rel="nofollow" attribute, well-behaved search engines won’t follow the links they represent when spidering. So if there was a way to automatically modify the links that comment spammers leave in comments, their chief goal — raising their standings in the search engines — would be deflated.

SixApart has just released the nofolllow plugin, which scans incoming comments and adds rel="nofollow" to each embedded link automatically. Normal users are not affected — they can still click the links. But the simple presence of links to spammer’s sites will do nothing whatsoever for their GoogleRanks.

The downside, as I see it, is that for this to be effective, it must be intalled in the majority of weblogs. Spammers need to understand that their campaigns are flaccid, and that won’t be true until most of the world is using a solution like this.

Just installed nofollow at birdhouse and at the J-School.

Music: African Head Charge :: Far Away Chant

Comment Spam Nihilism

Applying the MovableType 3.14 upgrade made a huge difference in server CPU usage when undergoing comment spam blitzkriegs, which now amount to barely a blip on the resource usage radar. Peace at last. Until…

A few days later we face a new anomaly: Someone out there has created a script that submits fake comments containing randomly generated URLs (all non-active and non-registered), randomly generated fake IPs, and randomly generated fake email addresses — they’re coming in locust clouds of one or two hundred at a time.

Because there are no recurring strings in these comment spams, blackisting them is pointless, and would only fill a blacklist database with garbage. Because the domains advertised are non-existent, I can’t correctly classify them as spam – they don’t advertise anything. Their purpose is purely vandalistic; to annoy blog owners and admins.

Even though Blacklist doesn’t catch them, they’re still held for moderation (so resource usage is nill), but you do have to take the time to batch-delete the suckers.

Posted a query to see if anyone had advice on battling this form of nihilism, but nothing useful so far. I’m quickly coming closer to the last resort: Forced registration for untrusted commenters.

eWeek on Comment Spam

Heard from a reporter at eWeek yesterday who wanted to interview me about Movable Type comment spam overloads and how they affect web hosts. Unfortunately I got the email too late and wasn’t interviewed for the story, which was published today.

Six Apart has released MT 3.14 to address a bug which was triggering rebuild behavior even in settings where it shouldn’t be necessary, such as when moderated comments are added (99% of comment spam is held as moderated by various mechanisms). We’ll be applying the patch to birdhouse blogs throughout the day.

Comment Spam – Up Against the Wall

The weblog comment spam problem has implications beyond crowded inboxes for users. Even with tools such as the incredible MT-Blacklist (which has blocked or moderated tens of thousands of comment spams on birdhouse-hosted blogs in the past few months), each request still requires a CGI process and a database request. When the spambots launch their massive onslaughts, shared hosting environments reel from the resource requirements. The problem has reached a critical threshold, and the muckety mucks at SixApart are coming out of the woods to address it head-on:

Jay Allen (author of MT-Blacklist and Product Manager at Six Apart) and Anil Dash (big cheese at SixApart) have both posted “official” positions on MT comment spam in the past few days.

So it looks like patches will be released in the next few days to address the biggest issues for web hosts. I like the fact that they’re approaching this not just as an MT problem but as an issue that affects all online discussion forums. The key to satisfying frustrated web hosts will be in creating a solution that can somehow block comment spam blitzkriegs without having to make a CGI and/or database call for every incoming request. It’s a hard problem to solve.

Update: Very good read on the many aspects and dimensions of comment spam load issues over at photodude. Throwing more hardware at the problem doesn’t make it go away (drooling over the server described there). Long comment section, also worth reading. One comment on the question of whether dynamic or statically generated sites fare better under this kind of load:

Also, last month, my husband and I shut down WordPress on the colo server we share with 3 other people, because … hits from comment spammers were making everything so slow. So we installed prerendering, which, if I’m reading this correctly, takes away the advantage of WP being dynamic(?) [right – this would make a dynamic site behave like a static site; you can’t win. -SFH].

Music: Mildred Bailey :: Squeeze Me

SURBLs

Just completed a transition of birdhouse hosting to a new machine (in the same data center) with a greatly increased monthly bandwidth ceiling, and have been able to raise bandwidth caps for all customer account levels.

Also took the opportunity to upgrade SpamAssassin to version 3, which, among other enhancements, supports SURBLs — Spam URI Realtime Blocklists. SURBLs essentially use the same logic as Movable Type’s blacklisting system – rather than trying to analyze content or block sender addresses or IPs (which are moving targets), SURBLs hit spammers where it hurts by blocking messages that include blacklisted URLs.

The downside of using SURBLs alone is that messages containing URLs that are not yet blacklisted slip through the net. But by combining SURBL scanning with content analysis, and by using distributed/collaborative blacklisting systems, you end up way ahead of the game.

Had to modify some of my customer’s SpamAssassin rulesets to work with the new syntax in SA3, but now that we’re dialed in, spam blocking seems to be more effective than ever – we’re catching about 98-99% of unwanted mail prior to delivery. w00t!

Music: Air :: Radian

Comment Spam as Data Corruption

At OJR, a fairly exhaustive piece on the history and status of weblog comment spam, including this zinger from Irish blogger Antoin O Lachtnain, laying down the gauntlet (though I doubt it had any effect):

Relevant comments are very welcome, whether you agree or disagree with what I have to say. However, advertising of goods or services is not permitted on this forum without payment of a fee. The fee per advertisement is 500 Euros, which is payable immediately by bank draft. If you post an ad but do not pay the charge immediately you have corrupted data on this Web site without my permission. As such, you are guilty of criminal damage under the Criminal Damage Act, 1991 and subject to a prison sentence of up to 10 years and a fine of up to 12,700 Euros…Please note that posting on this forum will have no effect whatsoever on the PageRank of any links that you post.

Music: Minutemen :: One Reporter’s Opinion

User Spam Preferences

Over the past few evenings, set up WebUserPrefs on birdhouse hosting to allow users to configure their own SpamAssassin sensitivity thresholds as well as whitelists/blacklists. In the process, ended up contributing my bug fixes and new features back to both WebUserPrefs and the Communigate Pro plugin for it.

Birdhouse has deleted 84,982 spams for 40 mail users in the past 7 days alone. Now up to 98-99% spam blockage for users with filters on stun. One of our power users receives 13,000 spams per week to a single user account. The vast majority of them were to made-up names on the domain; rejecting mail to unknown names brought that down to around 1,000. Satisfying progress.

Music: Yo La Tengo :: Flying Lesson (Hot Chicken #1)

MovableType 3 Sting

Completely sick of the comment spammers. MT-Blacklist is great at what it does, but only works after a string has been blacklisted, so every morning brings a heap of new garbage, “flies buzzing around my eyes, blood on my saddle.” Is the only viable long-term solution comment registration? To get that, you have to move to MovableType 3.0.

The J-School has been looking forward to MT3 for a long time, hoping for new features that would make it easier to manage the 17 blogs and 260 authors (with ~50 new authors added per semester) we currently support. What we didn’t anticipate was the new licensing scheme that could not only become prohibitively expensive, but a logistical nightmare as we try to track and pay for licenses for each new author, semester to semester. diveintomark has an excellent piece on why the “free enough” approach MT takes isn’t enough. Even if there’s a free version, tie yourself to a corporation and you’re subject to all their whims, prat falls, and unfortunate licensing decisions. Unless SixApart responds soon to my query on custom licensing, we’ll either be moving on to WordPress, a homebrew PHP/MySQL solution, or all of our blogs will be integrated into whatever CMS I choose for the rest of the J-School site this summer.

The licensing issue doesn’t apply to birdhouse — SixApart still offers a free version for non-commercial purposes. Disappointingly, MT3 offers almost no new features beyond comment registration. That’s okay – I’ve seen software revved to major numbers for minor changes plenty of times, and I wanted some real solution to the comment spam problem. So I ran the upgrade tonight. A few technical misfires (apparently not uncommon) — finally succeeded by installing the full version rather than the upgrade and then running the database upgrade script. Signed up for TypeKey, and received a token to drop into the new and improved back-end. Added the new DynamicComments directive to mt.cfg. But wait — after integrating the new comment registration tag into my templates, things fell apart. Not only were existing comments hidden from view, but when users clicked on the “Sign up to comment” link, they were told that birdhouse was not registered with TypeKey (it was). Screw it. Backpedal. Restore the original MT directory; fortunately it still worked, even though the upgrade script had modified some database structures.

MT3 is almost certainly a no-go for the J-School, and I’m increasingly skeptical about using it for birdhouse. There seems to be an MT –> WordPress exodus afoot, and I’ll probably join it. Lots of content out there on migration strategies.

Music: The Mekons :: Funeral