Fascinating project. The idea is that optical character recognition processes in place at the world’s largest book digitization projects naturally make lots of mistakes, and encounter plenty of computer-unrecognizable words – especially with older books or books printed with messier inks or using less-precise fonts. Rather than having staffers laboriously read every word of every book just to correct the clinkers, reCAPTCHA puts the hive mind to work, every time a member of the public solves a captcha.
About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that’s not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into “reading” books.
I’m going to replace a few captchas I’ve got in place at the J-School with reCAPTCHAs. I’d been meaning to add audio accessibility to them anyway, and reCAPTCHA has an audio option built in. Being able to contribute to book digitization is delicious gravy.
Update: Adding this video thanks to Jeremy:
oh wow. reCAPTCHA is really cool… I’m going to have to look into implementing that on my site.
(Welcome to my blogroll, btw.)
Feels more productive than SETI@Home which we did for a while.
http://nedbatchelder.com/text/stopbots.html
An interesting article about letting bots catch themselves so we don’t need to solve puzzles to input form data.
Excellent article Milan. I’ve seen all of those techniques used in various plugins and techniques, but never summarized so neatly before, or all in one place. Seems like an uber-plugin (or library that could be made into a plugin for multiple CMSs) would be wildly successful.
LOVED the joke there: