ALIPR Captchas

Captchas are so 2007. There are enough good captcha-breaking bots in the wild now that they’re pushing 10-15% success rates at decoding images, and can generate a new attempt every six seconds. Mail systems at Yahoo!, GMail and Hotmail all have been cracked in the past year. And Google’s Blogger service is under seige from spambots creating hundreds of thousands of splogs without human interaction — and they’re doing it through automated captcha cracking.

A new visual authentication system called IMAGINATION, from Penn State’s ALIPR (Automatic Linguistic Indexing of Pictures) program, takes a very different approach. Working with random images rather than characters means the pool of possibilities is not finite (image recognition is far more difficult than character recognition). And the two-part process refines the human requirement further: Find a center, then describe.


But while traditional captchas have had problems with accessibility, ALIPR is going to be completely off-limits to the blind. Oh, and it takes up a whole screen, rather than a few hundred pixels2. That sounds like a deal-breaker right there. Or at least a deal-breaker until we get so fed up with being cracked that interaction designers are willing to give up an entire page to make it stop.

Once you solve the captcha, the site invites you to throw your best bot at it. I’m thinking maybe five years before the bots crack this one.

Music: David Byrne :: (The Gift Of Sound) Where The Sun Never Goes Down

Spam J-Curve

Weblog comment spam rates continue to surge. This chart is from one installation of anti-blog-spam tool Defensio, showing an insane uptick through the last part of March, 2008:

(Thanks ViperBond). Akismet’s charts show more than 5 billion blog spams identified in the past two years. I’ve personally noticed a dramatic increase in hand-written blog spam recently. Knowing that tools like Defensio and Akismet are going to get spammers banned from blogs net-wide within minutes, the method is now one of social engineering – getting bloggers to consciously allow spammy comments to go live by making them highly relevant to the post they’re attached to, and plausibly written. All that distinguishes this latest form are the author URLs — which no longer point to cialis and poker sites, but to tile shops, beauty parlors, commercial art galleries, pool-cleaning supply houses, etc. Human blog spammers have been around almost as long as bots (to defeat captchas, etc.) but this latest form amazes me because it’s written so carefully. I really have to puzzle over some of these recent ones to decide whether to push them through or not.

Because Akismet is less likely to have identified these as spammy, the moderation burden falls back onto blog authors. It’s no longer possible to identify spam at a glance – we now have to study each message carefully to ascertain sincerity.

When Good Mail Goes Bad

Great way to wrap up a holiday: Agreed to take on a new Birdhouse client – a mid-size company who’s had a horrible email experience with their previous “top tier” provider. They had a dozen or so addresses; could we take them on? No problem. The old host had been storing a couple weeks worth of their mail, but there was no way to get it through to the mail exchanger for delivery. The old host agreed to relay it all to Birdhouse for processing.

That’s when things turned ugly.

Turns out the previous host didn’t have the basic common sense to discard mail to unknown addresses on the domain (it hasn’t been feasible to accept mail for unknown names, like for years. But they were not only accepting it all; they relayed it ALL to Birdhouse.

300,000 messages worth, 95% of which was theoretically discardable.

Unfortunately, discarding crap mail isn’t trivial when parsing a queue that large. Needless to say, things came to a grinding halt. Complicating matters was the fact that Birdhouse actually utilizes two mail queues: One for MailScanner, which pre-processes spam, and another for Exim, which is the actual mail transfer agent. The MailScanner queue was so large we couldn’t even get things out into the Exim queue. Exim documentation assumes a single queue, and MailScanner doesn’t offer the same range of queue management options that Exim does.

Which meant I got to script a solution, examining each messages on the pre-que to determine whether it was destined for a valid or invalid address, and dropping it if invalid.

The script is running now, but will take a while. All spectacularly unpleasant. Once again, wanting to skewer a spammer or two and painfully aware of how much of my time is consumed by fighting bad guys.

Progress updates on the Birdhouse System Status page.

Music: Andy Bey :: I Let a Sing Fo Out of My Heart

Spam Poster Art

Nogirls Thumbtack Press hosts a gazillion great pieces of indie art (whatever that means), available as posters. A friend tracked down this excellent collection of art posters made from common spam taglines. Also loved “Realize your dreams with our help for a short time.” How promising! Also: “This secret weapon will give more power to you little soldier.” Little soldier thanks you. See also: Spam Plants and Spamland.

Music: Leaders Of The New School :: Mt. Airy Groove


The creativity of spammers never ceases to amaze me. Received this overnight, smack in the middle of a dozen spammy comments that made it through Akismet (but not through the moderation layer):

Hello , my name is Richard and I know you get a lot of spammy comments , I can help you with this problem . I know a lot of spammers and I will ask them not to post on your site. It will reduce the volume of spam by 30-50%. In return Id like to ask you to put a link to my site on the index page of your site. The link will be small and your visitors will hardly notice it, its just done for higher rankings in search engines.

I feel so vulnerable, so helpless. I don’t know who to turn to for help. OK Richard, you’ve got a deal!

Music: Steely Dan :: Black Cow

Phishing Quiz

How good are you at identifying phishing scams? Interesting quiz at showing screenshots of 10 real sites and their phished counterparts side by side. I consider myself pretty well versed at picking out the tell-tale signs, but only got 8/10. What’s really scary is the fact that the quiz called me a “guru” for getting that score – which means that 20% of phishing sites are good enough to fool pretty much everyone (although the screenshots from the two I missed didn’t show the URLs, which is probably the most critical clue, though even those can be made to look convincing, or wholly spoofed in various ways).

How’d you score, and what threw you off?

Music: The Meters :: Chug Chug Chug-A-Lug-(Push N’ Shove)-Part II-(w Meters)


The spam I (secretly) most appreciate is the sort that uses randomly generated text cut-ups to bust spam filters, some of them fully worthy of the cut-up experiments Burroughs and Gysin were doing in the late 50s / early 60s.

In a gravitation without warning the face of rubbing grew sullen Black angry mouths, the clouds swallowed up the horsehair The air was religion with suppressed excitement

The Brothers McLeod are doing wonderful things with cut-up spam, having developed a series of animated characters to read aloud and act out the impossible, often mythical scenarios.

nodded. The door was closed and sealed again. Quietly forward. Hands extended, fingers lightly bowed. Iron John was Thats why there is no record of them

My own spam filters seem to have wised up to this form of spam in the past year, but every now and then one will eep through the multiple gatekeepers that mostly protect me from scarybig Spamland, discretely dropping special treats at my door in the middle of the night, causing me to feel the tiniest bit hopeful.

Accidental art committed for all the wrong reasons can still be beautiful, right?

Music: Will Oldham :: Ode #1b

Volume, Volume, Volume

At IT Conversations interesting discussion (podcast) with Mikko Hypponen, director of anti-virus research for F-Secure. Hypponen threw out two sets of numbers that seem to collide, but don’t.

1) Spammers consider a response rate of 0.001% to be a “good” email spam campaign.

2) 40% of Americans (and 60% of Brazilians) report having made a purchase as a result of a spam sales pitch at least once.

How to square the difference? Volume, volume, volume.

I confess to having bought something from a spam once (and only once): A targeted pitch for a T-shirt bearing a big retro “Shacker” logo. It appeared that the spammer in that case had blasted their message to No matter that “shacker” in the marketer’s context referred to college students who sleep in a different dorm room every night — I had to have it.

Music: Derek Bailey :: Gone With the Wind

Botnets on the Rampage

“There has been a 67 percent increase in overall spam volume and a 500 percent increase in image spam since Aug. 2006.”

Botnet Illuminating (but seriously depressing) series of articles at eWEEK on botnets — arrays of 0wnz0r3d Windows computers assembled under the control of sophisticated “bot herders,” silently pumping every orifice of the interweb full of spam in all its forms. The virus that makes a machine part of a botnet does not cause harm to its host – like all successful viruses, it wants to assure its own survival. Amazingly, the latest generation of botnet software even installs antivirus software (a pirated copy of Kaspersky Anti-Virus, to be specific) to eradicate competing malware, so it can have the full resources of the infected host to itself.

For a while, it looked like botnet activity was shrinking, but lately it’s seen a huge uptick. vnunet reports that a million-bot botnet is quietly being assembled around the world, and that we’ll soon see an even more massive onslaught of phishing and spam attacks.

The sophistication of these systems is amazing — the botnets even come with their own self-contained DNS system. “This allows a bot herder to dynamically change IP addresses without changing a DNS record or the hosting—and constant moving around—of phishing Web sites on bot computers.”

So can’t botnet hunters just focus on nailing the central command and control machines? Nope – that’s the “beauty” of using a peer-to-peer model:

Control is still maintained by a central server, but in case the control server is shut down, the spammer can update the rest of the peers with the location of a new control server, as long as he/she controls at least one peer.

One of the many factors that makes fighting back so hard is that infected bots expect incoming commands to be digitally signed. Commands from the bot herders to members of the botnet are securely encrypted, and virtually impossible to decipher or reverse-engineer.

The sophistication of modern spammers is impressive on so many levels. Image spam (e.g. Viagra ads that appear as graphics rather than text) has been especially vexing lately, as it seems to elude all filters. Since almost all anti-spam mechanisms — even collaborative ones like Akismet — rely to some extent on the ability to deduce unique “signatures” from a message, every single image sent by machines on a botnet has slightly different dimensions and characteristics, making it nearly impossible to nail down. I’ve even noticed random graphical noise splattered in the background of image spam lately – which prevents any two images from producing identical signatures.

I think I was wrong when I said recently that my IP firewalling script was becoming less effective because spammers had learned to spoof IPs. I believe now that the problem is that the botnets are so widely distributed that the same IPs don’t come up with enough repetition to be useful. Rather than spam spewing from a volcano somewhere in the Ukraine for a few days, it’s now more like a steady mist that suffuses the atmosphere – an endless acid rain emanating from everywhere at once.

What amazes me is that articles like this never seem to point out the obvious: The botnets are comprised entirely of Windows machines. There are currently approximately 5.7 million infected Windows computers out there, ready and able to join a botnet at any time. If I were the sysadmin of a Windows network, this would be significant information to me. It’s not that OS X or Linux are theoretically incapable of this kind of takeover, but the plain reality is that it doesn’t happen. And yet, articles like this never make a recommendation that admins consider a platform shift. Why?

Sadly, experts are starting to feel hopeless about their prospects of staying in front of the game.

We’ve known about [the threat from] botnets for a few years, but we’re only now figuring out how they really work, and I’m afraid we might be two to three years behind in terms of response mechanisms,” said Marcus Sachs, a deputy director in the Computer Science Laboratory of SRI International, in Arlington, Va.

Amazon is having serious issues with spam, as is Of course one would expect large services to be constantly hammered with spam, but if the largest and best-funded commercial entities on the web can’t keep spam off their public doorsteps, you know things are getting serious out there.

It’s becoming increasingly popular for admins to block entire nations, either at the apache or at the firewall level. I’ve been tempted to do the same myself, but haven’t. Yet.

All of this applies to the interactive aspect of the web as much as it does to email. I deal with it on wikis, discussion boards, blogs, and apache logs (referrer spam). In recent months, I’ve seen them stuffing personal contact forms, and even the public jobs database at the j-school (which is absurd, since no job ever gets published without human review, but that doesn’t stop them from trying). Amidst all the Web 2.0 talk of participatory journalism, the wisdom of crowds, the read/write web, and two-way communication, it’s those very features that are being exploited by spammers and the massive botnets.

I worry that the openness that made the internet possible will ultimately become the sword upon which it impales itself. I see a future where everything is so locked down that all of the fun participatory stuff becomes impossibly difficult. I worry that someday email will only be feasible with whitelisting, that registration with identity verification will be required for all participatory web features, and that the concept of anonymity will ultimately become untenable.

Compare the atmosphere of the internet to the ecology of the earth. It took us millions of years to get to industrial civilization, then just a few decades to pollute our environment to the brink of sustainability. I worry that the internet is following a similar course – 30 years to become mainstream and five years to become so polluted it’s unusable.

Thanks Mal

Technorati Tags: ,


John Battelle’s SearchBlog, which is hosted by Birdhouse, has been undergoing a constant (and brutal) deluge of weblog comment spam over the past few days. It’s always been bad for him, but I’ve never seen anything like this. Akismet is still the bomb, but even the mighty Akismet couldn’t stay out in front of this wave. Since Akismet only knows about spam that’s been submitted to it by the hive mind, the first blogs to receive a new wave of spam are unprotected by it.

The script I wrote a while ago to query blog databases for spammy behavior and shunt IP addresses into the firewall works wonderfully when IP addresses are legitimate, but it seems that most spammers know how to fake their IPs these days, rendering it ineffective.

Ever wondered what a comment spam blitzkrieg looks like from a server load perspective? Take a look at the load average graph from today (snapshot every 6 minutes):

Comment Spike-1

Those spikes, some representing fairly long blocks of time, represent thousands of bogus comments being submitted into simultaneously. For reference, load averages shouldn’t spike above 1.0 too often, or things get uncomfortable. This is why spammers – especially weblog comment spammers — make me insane.

Decided Battelle needed a second line of defense. We were reluctant about using a captcha for the usual accessibility reasons, so I went looking for a good Turing test system and found TinyTuring by Kevin Shay. As human detectors go, it doesn’t get much simpler than this – requires comments to enter just a single randomly selected letter. A hidden salt prevents algorithmic detection. Required modification of three MT templates. So far, 100% effective. Yes, armies of underpaid Malaysian human spambots can still jam crap into the system manually, but those comments will still have Akismet to deal with.

The cat and mouse game continues.

Music: Billy Martin :: Strangulation