Misc. Notes on Cached Content

Analysis at WebProNews on the legality of Google’s Print for Libraries project, in which Google is intending to pick up where Project Gutenberg leaves off – not only reproducing full text of public domain works, but also excerpts of copyrighted material.

The entire text of books considered to be public domain and out of copyright will be scanned and made available online. For copyrighted material, the books will be scanned, and snippets will be made available structured around search terms with links to where the book can be checked out or purchased.


All at great expense to Google. And all in the name of enabling and simplifying research for users — a fact which puts Google in different legal territory than other parties where it comes to reproducing copyrighted material (don’t mean to imply that Print for Libraries is an act of Google altruism, only that these facts will affect the court’s view of the legality of the library).

Not addressed in that article, but somewhat related, is Google’s cache feature, which stores complete copies of web pages even after those page have been removed from the host. Often useful for researchers, the feature has always seemed to me to make Google the single largest violator of copyright on the Internet. Most of us are willing to turn a blind eye to Google republishing cached results; if I post a photograph or essay and you re-publish it on your own site, I’m most likely going to spank you. So why don’t I care if Google does the same? Because the purpose of the republication — the research value — outweighs any purist instinct I might have about republication. Still, Google cache has always seemed legally dubious to me.

Tangentially: Thinking lately about how RSS feeds occupy gray copyright territory as well. In one sense, an RSS feed is just a file living on a server like any other, and should be subject to the same restrictions against re-publication by third parties.

But when publishing to RSS, there’s a tacit understanding that one does so not just for the sake of personal desktop aggregators, but for inclusion of its content into other web sites, portals, etc. Even if nothing in the RSS feed says that the rules are different for that file than they are for all other files on a given server, one is still implicitly waiving re-publication restrictions when one chooses to publish to RSS. So by that logic, if I choose to publish full-text RSS rather than excerpts, then do you have the right to reproduce my entire feed on your site as an include?

Let’s take it as a given that any content placed on a public server is going to be copied and re-used; it’s inevitable and unstoppable, and only the fool places content on the web that they don’t want to risk being republished. On the other hand, existing copyright law does protect you legally should someone republish your content without permission. RSS lives in a parallel universe; the intention of putting it on a server is open republication. So does that implied right to republish extend to its logical conclusion? That is, if I publish full-text RSS, are you more entitled to duplicate my content on your site without permission than you would be otherwise? More on this here.

And what about RSS expiration? This blog has some 1700 entries, while my RSS feed only shows 15 or so at a time. So does your implied right to republish my RSS content continue even after that content has scrolled off the end of my current feed?

Music: Dave Van Ronk :: Gaslight Rag

One Reply to “Misc. Notes on Cached Content”

Leave a Reply

Your email address will not be published.