Akismet for MT

The collaborative comment spam filtering database has drastically improved the game for me over the past few months, but until recently, it worked only with WordPress. Just days after I switched from MovableType to WP, I was contacted to help with a secret beta test of a version of Akismet for MT. Since I could no longer run that test on this blog, I deployed it on John Battelle’s Searchblog and Mary Hodder’s Napsterization, two of Birdhouse’s hardest-hit installations. After identifying some bugs and an initial rocky start, the plugin started kicking some serious butt.

Today Akismet/MT went public — ironically at the same time some independent coders developed their own versions. So far, the only thing that seems to hang it up are scoring conflicts with other installed systems. For example, if you have MT set to score +1 for a comment containing less than three links, but Akismet flags a comment as spam and ranks it -1, the two scores cancel each other out. But those are minor bumps.

Unfortunately Akismet isn’t quite the true golden egg in terms of reducing server load, though it does help. My comments on that topic here.

Spammers listen up: There are a whole lot more of us than there are of you, and it’s really hard to imagine you figuring out how to game this system. You don’t stand a chance.

Technorati Tags: ,

Google Mini Minus Link Love

First-hand report from an organization using the Google-mini appliance. Mostly run-of-the-mill observations, but it surfaces a limitation of Goog-in-the-enterprise that hadn’t occurred to me: Google’s secret sauce is PageRank, and PageRank depends on link love. But if what you’re indexing is a few thousand Word/Excel/PDF documents that don’t link to each other, there is no link love to be had, and you’re back to Alta Vista days and plain old keyword frequency.

If the interlinking metadata between documents is non-existent, and PageRank is zero on every one of your documents, you’re back to keyword frequency matching.

That’s not really a criticism of the Google appliances themselves, as I’m not sure what could be done about it, but it seems to me a bit like selling an invention known for one special feature… without that feature. Big Macs without the secret sauce.

Music: Captain Beefheart and His Magi :: Grown So Ugly

Ask Philosophers

Ever wonder what real, working philosophers think about subjects like medical immortality or whether alcoholics should be allowed to breed? Ask Philosophers has assembled a couple dozen professional philosophers to provide commentary on questions from the general public.

There is a paradox surrounding philosophy that AskPhilosophers seeks to address. On the one hand, everyone confronts philosophical issues throughout his or her life. But on the other, very few have the opportunity to learn about philosophy, a subject that is usually taught only at the college level. (Why? There is no good reason for this and plenty of bad ones.) AskPhilosophers aims to bridge this gap by putting the skills and knowledge of trained philosophers at the service of the general public.

Is thought possible without language? (re: Helen Keller)” … “What, if anything, distinguishes natural from artistic beauty?” The answers aren’t always 100% satisfying (philosophy never is), but they do a great job of bringing clearer focus to the questions themselves.

Can Non-Being and Being occupy the same space at the same time?” How many hands do you have? Two? Or do you have three? Your left hand, your right hand, and the non-existent third hand that’s attached to your head? Obviously, that last “hand” shouldn’t count. To say that you don’t have a third hand isn’t to say that you have a hand that possesses the particularly stunting property of non-existence.

Especially amazing is the fact that the site has been so successful in getting real philosophers to engage the public so actively/enthusiastically. A wonderful experiment.

Music: Devendra Banhart :: Michigan State

Technorati Tags:

Blogosphere Suffers Spam Explosion

c|net on the increasingly difficult problem of fighting spam on weblogs:

Boing Boing would allow its readers to leave comments and engage in a discussion on the wildly popular blog, if it weren’t for spam.

The piece focuses more on problems bloggers themselves face:

“It is a major hassle,” Frauenfelder said. “It is just getting worse and worse. My fantasies of violent revenge against spammers become more lurid every week.”

than on problems caused for their web hosts, and is a superficial overview in many respects, but it’s good to see some mainstream attention to the problem, which consumes more of my time than I had ever imagined it would.

At this point, I’ve tried every approach under the sun for the Birdhouse bloggers: standard blacklists (a moving target), moderation and authentication (chilling effect on conversation), mod_security blacklists (hard to keep updated, resource intensive), javascript (ultimately hackable), referrer tracking (shuts out commenters behind certain firewalls)…

But I’ve never had it as easy as I have since switching to WordPress and setting up the distributed Akismet system, which has blocked more than 1,000 spams from this blog in the past two weeks without a single false positive, and while requiring very minimal system resources. Sounds like a lot, but some of my users average around one spam/trackback submission attempt per minute, 24×7. You do the math.

Music: The Flaming Lips :: What Is The Light?

Technorati Tags:

King of the Mondegreens

Now that The Archive has been collecting user votes for a few months, I’ve created a Funniest lyrics page, showing aggregate vote tallies for the top 250 lyrics. It’s been interesting watching vote counts go up and down, as most votes seem to cancel each other out. 5,000 page views may result in a net positive of just 50 or so “Funny” votes (I see this phenomenon with the submissions backlog as well). Unsurprisingly, the collective consciousness considers the bawdiest mishearances the funniest. Which is a shame, since it pushes brilliant mishearances like Clown control to Mao Tse Tung toward the bottom of the list. But that’s democracy. For ya.

Music: Half Man Half Biscuit :: On Passing Lilac Urine

Tapped

Tapped Caught this stickered pay phone with my cell (can you tell I’m dying for a phone with a better lens?) while racing to catch a train tonight — a reference to Bushco’s wiretapping compulsion. “Your conversation is being monitored by the U.S. government courtesy of the U.S. Patriot Act of 2001, Sec. 216, which permits all phone calls to be recorded without a warrant or notification. For more information, visit crimethinc.com.” Apparently these have been available for a while, though it was my first encounter. Effective culture jam.

Music: Lagbaja :: Mummy Hi

makestreams

I’ve written a brief shell script to automate the task of batch-processing piles of QuickTime movies for use with QuickTime Streaming server. makestreams adds metadata, adds hint tracks, and generates .qtl reference movies for all files in the current directory. Requires the qtmedia and qtref command-line binaries only present in OS X Server.

Music: Can :: Cascade Waltz

Technorati Tags:

The Economics of UGC

How do the User Generated Content sites do it? How can Flickr, YouTube and the like possibly make money through limited revenue options while simultaneously giving away absolutely massive piles of storage and bandwidth?

Economies of scale kick in big-time, and there’s still a lot of unused capacity out there, but you have to wonder how sustainable it is to allow users paying very little or nothing at all to dump the entire contents of their Flash memory cards onto Flickr every day. Not to mention the fact that uploading 13 nearly identical pictures of your cat onto Flickr rather than one pollutes the quality of the datastore for all users (I’ve never understood why Flickr doesn’t strongly limit the number of images that can be uploaded per day, forcing people to edit their collections).

Some discussion in TWiT episode 47 about Yahoo’s purchase of Flickr and how they’re now finding it an economic albatross. Photo printing from Flickr is an obvious revenue opportunity, but according to a TWiT insider, 10 million Flickr users generate about 80 print orders per week. News flash: People are there for the community, not for abstracted printing possibilities. But once you invite people to upload their lives into your service, you’re committed, no backing out.

Despite seemingly problematic revenue opps, Yahoo! is continuing their UGC/Web 2.0 purchasing spree: they apparently have an offer on the table to buy Digg. UGC is a critical aspect of Web 2.0, and they can’t afford to miss the boat.

The recent proliferation of free massive storage systems has changed user expectations for all hosting systems. Alex King, on user expectations at FeedLounge:

When I hear someone say “a service like this should be free”, it feels a little like they are saying “your time and investment are worth nothing”. I know it’s not personal, but to make a really great product, you have to invest yourself personally.

Birdhouse struggles with this too. For example, we simply can’t offer a webmail system as good as GMail’s (for any amount of money), and we sure as heck can’t offer 2GB of storage to anyone who comes by and asks. But due to the quality of modern webmail systems like Yahoo’s and Google’s, people just assume that all webmail will be of similar quality. Without truly massive investments and economies of scale, small and medium-sized hosts are stuck offering Web 1.0 technology in a world that already expects Web 2.0 quality and scale.

But it goes beyond webmail: Now that Google and Yahoo (and soon Microsoft) are making quick inroads into the web hosting business, the picture isn’t pretty for smaller hosts. What we can — and do — offer is excellent hand-holding and custom setups that the cookie-cutter monoliths can’t offer. And while the bandwidth and storage we provide may seem puny by comparison, I haven’t met a customer yet who actually felt cramped by our offerings – 500MBs is a huge web site… unless you’re throwing a ton of audio and video around.

I’ve been experimenting with UGC for nine years at the Archive of Misheard Lyrics, and have made money from it. Not big money, but some. But I’ve had the advantage of being able to do it on a high-impressions/low-bandwidth model – lyrics pages are tiny chunks of text in a database. And unlike free-for-alls like Flickr, I exert editorial control over the content, and don’t let just anything onto the site*. I know that UCG can be a workable revenue model, under the right conditions. But how this scales to unlimited free photo/video/audio hosting remains to be seen.

* Although in the past I’ve used volunteer editors, some of whom have let huge numbers of unfunny lyrics into the live pool; the current user voting system (which I guess is a bit Web 2.0 itself) will eventually correct for that.

See also Nick Cubrilovic: The Economics of Online Storage.

Music: The Minutemen :: Futurism Restated

Technorati Tags:

The Omnivore’s Dilemma

J-School professor and Birdhouse user Michael Pollan has written a new book, The Ominvore’s Dilema: A Natural History of Four Meals:

In this groundbreaking book, one of America’s most fascinating, original, and elegant writers turns his own omnivorous mind to the seemingly straightforward question of what we should have for dinner. To find out, Pollan follows each of the food chains that sustain us—industrial food, organic or alternative food, and food we forage ourselves—from the source to a final meal, and in the process develops a definitive account of the American way of eating.

The book has recently been reviewed by the SF Chronicle, The Washington Post and Salon. I’ve done a lot of work on Pollan’s site over the past few months.

Pollan will be on NPR twice this week: Tuesday on Fresh Air with Terry Gross, and Friday on Science Friday. Check your local listings for times.

Technorati Tags: ,