Sunday, July 29, 2012

Fair Use Dejà Vu

In its July 27 court filing,[1] Google has made the case for its fair use defense for the digitization of books in its Google Book Search (GBS) project. [2] As many of us have hoped, the case it makes appears strong. That it was necessary to throw libraries under the bus to achieve this is unfortunate, but I honestly do not see a an alternative that wouldn't weaken the case a bit.

Fair Use is Fair

The argument that Google has made from the beginning of its book scanning project is that copying for the purpose of providing keyword access to full texts is fair use. They are fortunately able to cite case law to defend this, including case law allowing the copying of entire images by image search engines.

Among the reasons that they give for their fair use defense are:

1. Keyword search is not a substitute for the text itself. In fact, the copy of the text is necessary to provide a means for users to discover the existence of books and therefore for the books to fulfill their purpose of being read.
"Books exist to be read. Google Books exists to help readers find those books. Like a paper index or a card catalogue, it does not substitute for reading the books themselves..." (p. 2)

2. Google has elaborate protections in place to prevent users from reconstructing the text from its products. They reveal some of these protections, such as disabling snippet display for one instance of the keyword on each page, and disabling display of one page out of ten.
"One of the snippets on each page is blacklisted (meaning that it will not be shown). In addition, at least one out of ten entire pages in each book is blacklisted." (p. 10)
3. No advertising appears on the GBS pages. This implies that Google is not making any money that could be claimed by authors as being theirs.

4. The Authors Guild has no proof of harm that has come from the digitization of the books. It is suggested that a thorough study might show that there have been gains rather than losses in terms of book sales. Even the Authors Guild (the Plaintiff in this case) advises authors to provide some of the text of their books (usually the first chapter) for browsing in online bookstores, and many rights holders participate voluntarily in Amazon's "Look inside" feature that shows considerably more than the disputed snippets that are displayed in GBS. And Google notes that 45,000 (!) publishers have signed up to have their in-print books searchable in GBS, with varying amounts of text available to the searcher prior to purchase. This makes the case that search and some text display is good for authors, not harmful.

5. Digital copies of books have never been "distributed to the public" (key wording in the copyright law). Only the libraries themselves that held the actual hard copies could receive a copy of the files resulting from the digitization.

Of course, all of this is done citing court cases in support of these arguments. The Authors Guild undoubtedly has counter-cases to present.

Libraries Under the Bus

One of the key copyright-related arguments that Google makes is that its full text search within books provides a public service and support of research that is unprecedented. In making these claims Google decided to particularly emphasize its superiority to library catalogs. (Google refers multiple times to "card catalogues" which seems oddly antiquated, but perhaps that was the intent.)
"The tool is not a substitute for the books themselves -- readers still must buy a book from a store or borrow it from a library to read it. Rather, Google Books is an important advance on the card-catalogue method of finding books. The advance is simply stated: unlike card catalogues, which are limited to a very small amount of bibliographic information, Google Books permits full-text search, identifying books that could never be found using even the most thorough card catalog." (p.1) [sic uses of "catalogue" and "catalog" in the same paragraph.]
"Google Books was born of the realization that much of the store of human knowledge lies in books on library shelves where it is very difficult to find....Despite the importance of this vast store of human knowledge, there exists no centralized way to search these texts to identify which might be germane to the interests of a particular reader." (p. 4)
As a librarian, I have to say that this dismissal of the library as inadequate really hurts. Yet I believe that Google is expressing an opinion that is probably quite common among information searchers today. One could counter with many examples where the library catalog entry succeeds and GBS fails, but of course that wouldn't bolster Google's arguments here. A reasonable analysis would put the two methods (full text and standards-based metadata) as complementary.

Google also argues that it did not give copies of the digital files resulting from its scanning to the libraries. How this plays out is not only clever, but it shows some real foresight on Google's part. They developed a portal where the libraries could request that a copy of the files be made "on demand" for the library, and using an encryption specific to that library. The transmission of the files from Google to the libraries was then an act of the libraries, not of Google.
"Moreover, the undisputed facts show that it is the libraries that make the library copies, not Google, and that Google provides only a technological system that enables libraries to create digital copies of books in their collections. Under established Second Circuit precedent, Google cannot be held directly liable for infringement because Google itself has not engaged in any volitional act constituting distribution." (p. 33)
Clearly, Google designed the system (with goes by the acronym "GRIN") with this in mind.

I don't mind this, but wish that Google hadn't included a dig at HathiTrust as part of this argument. The document would not have suffered, in my opinion, if Google had left the parenthetical phrase off of this sentence:
"No library may obtain a digital copy created from another library's book -- even if both libraries own identical copies of that book (although libraries may delegate that task to a technical service provider such as HathiTrust)." (p. 15)
It's one thing to claim innocence, but another to point the finger at others.

Omissions

There a few glaring omissions from the document, some of which would weaken Google's case.

There is no mention of the computational uses that can be made of the digital corpus, something that was a strong focus in the failed settlement between Google and the authors and publishers. I have no doubts that Google is currently engaged in research using this corpus -- I don't see how they could resist doing so. They do mention the "n-gram" feature briefly, but as this is based on what appears to be a simple use of term frequency, it may not attract the court's attention.

In another omission, Google states that:
"Informed by the results of a search of that index, users can click on links in Google Books to locate a library from which to borrow those books ... " (p. 4)
Google fails to state that this is not a service provided by Google but one provided by OCLC using exactly those card catalogues that Google finds so inadequate. Credit should be given where credit is due, but there is an important battle to be won.

Bottom Line

The ability to create full text searches of printed works (and other physical materials) is so important to research and learning -- and should be such an obvious modern approach to searching these materials -- that a win for Google is a win for us all. Although some aspects of this document shot arrows into my librarian-ly heart, I hope with all of that wounded heart that they prevail in this suit.


[1] This points to the ScribD site which unfortunately is now connected to Facebook and therefore is a huge privacy monster. The document should appear on the Public Index site shortly, with no login required.
[2] The term "product" could also be used to describe GBS.

4 comments:

Joe Montibello said...

Hi Karen,

I can't argue with the point that more access to information is better, in general. But I think there's an interesting conflict between this:

"...a win for Google is a win for us all."

and this:

"...ScribD site...connected to Facebook...huge privacy monster."

Ever since I heard about the Google book scanning endeavor, I've wished that libraries had had the guts, technical skills, and resources to tackle that project for ourselves. If libraries had digitized these books, getting over the legal hurdles would have been what my friend Andy used to call "real and permanent good in this world." If Google manages to make this material available, there will be a cost, somehow - privacy, ad selling, or something else that I'm not clever enough to think up (but Google is).

Anyway, thought-provoking post as always.
Joe M.

Karen Coyle said...

Joe, I totally agree with you about wishing that libraries had undertaken this rather than leaving it to Google. The win, however, is for all of us because the fair use decision means that anyone, not just Google, can digitize for the purpose of providing search. This would then mean that projects like the Digital Public Library of America (dp.la) can undertake search-related digitization projects without fear of ending up in the 6-year lawsuit that Google has been submitted to. I also think it may encourage funders to fund library projects, since a key aspect of the legal uncertainty will have been decided.

Then we move on to orphan works and the problems those pose.

Anonymous said...

The fact of the matter is, if Google wins their suit, they'll have a monopoly on the e-publishing business. They'll be able to charge whatever they want to libraries for subscriptions--they're already charging thousands of dollars. That can't be a good thing.

Karen Coyle said...

Anon,

Winning or losing, it doesn't really matter -- Google already has over 45,000 publishers signed up for its e-book business. But if the proposed settlement had gone through Google would have had a LEGAL monopoly on digitizing out-of-print works. Fortunately that was rejected by the court. With a fair use assessment, others (like libraries) can digitize works for the purpose of indexing. That won't have any affect on Google's power, but it opens up possibilities. For example, I can imagine scholars working on selected canons, like certain sets of early works. Unlike Google's digitization, these would be selected and corrected, and therefore would better serve academia.

We will all be overshadowed by the size and wealth of Google regardless. That makes it all the more important to understand the weaknesses of Google's approach, at least for serious study. My bet is that funding for selective projects will come available when the legal issues are settled.