December 12, 2007March 23, 2011 by shacker

reCAPTCHA

Fascinating project. The idea is that optical character recognition processes in place at the world’s largest book digitization projects naturally make lots of mistakes, and encounter plenty of computer-unrecognizable words – especially with older books or books printed with messier inks or using less-precise fonts. Rather than having staffers laboriously read every word of every book just to correct the clinkers, reCAPTCHA puts the hive mind to work, every time a member of the public solves a captcha.

About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that’s not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into “reading” books.

I’m going to replace a few captchas I’ve got in place at the J-School with reCAPTCHAs. I’d been meaning to add audio accessibility to them anyway, and reCAPTCHA has an audio option built in. Being able to contribute to book digitization is delicious gravy.

Update: Adding this video thanks to Jeremy:

Music: Gary Wright :: Our Love Is Alive

4 Replies to “reCAPTCHA”

jer says:

December 13, 2007 at 6:39 pm

oh wow. reCAPTCHA is really cool… I’m going to have to look into implementing that on my site.

(Welcome to my blogroll, btw.)

Reply
Jeb says:

December 14, 2007 at 2:55 pm

Feels more productive than SETI@Home which we did for a while.

Reply
Milan Andric says:

December 16, 2007 at 8:52 pm

http://nedbatchelder.com/text/stopbots.html

An interesting article about letting bots catch themselves so we don’t need to solve puzzles to input form data.

Reply
shacker says:

December 16, 2007 at 9:10 pm

Excellent article Milan. I’ve seen all of those techniques used in various plugins and techniques, but never summarized so neatly before, or all in one place. Seems like an uber-plugin (or library that could be made into a plugin for multiple CMSs) would be wildly successful.

LOVED the joke there:

Jim and Joe are out hiking in the forest, when in the distance, they see a huge bear. The bear notices them, and begins angrily running toward them. Jim calmly checks the knots of his shoes and stretches his legs.

Joe asks incredulously, “What are you doing? Do you think you can outrun that bear!?”

Jim replies, “I don’t have to outrun the bear, I just have to outrun you.”

Reply

4 Replies to “reCAPTCHA”

Leave a Reply Cancel reply