Thursday, November 14, 2013

It's FAIR!

"In my view, Google Books provides significant public benefits. It advances the progress of the arts and sciences, while maintaining respectful consideration for the rights of authors and other creative individuals, and without adversely impacting the rights of copyright holders. It has become an invaluable research tool that permits students, teachers, librarians, and others to more efficiently identify and locate books. It has given scholars the ability, for the first time, to conduct full-text searches of tens of millions of books. It preserves books, in particular out-of-print and old books that have been forgotten in the bowels of libraries, and it gives them new life. It facilitates access to books for print-disabled and remote or underserved populations. It generates new audiences and creates new sources of income for authors and publishers. Indeed, all society benefits." p. 26
With that statement, Judge Denny Chin has ruled (PDF) that Google's digitization of books from libraries is a fair use.  And a very long saga ends.

Google was first brought to court in 2005 by the Author's Guild in a copyright infringement suit for its mass digitization of library holdings. Since then the matter has gone back to the court a number of times. Most significantly, Google, authors, and publishers developed two complex proposed settlements that were, however, so fraught with problems that the Department of Justice weighed in. Finally, the publishers bowed out and the original Author's Guild suit was revived. At that point, the question became: Is Google's digitization of books for the purposes of indexing (and showing snippets as search results) fair use?

Of course, much happened between 2005 and 2013. One important thing that happened was the development of HathiTrust, the digital repository where libraries can store the digital copies that they received from Google of their own books. The same Authors Guild sued HathiTrust for copyright infringement, but Judge Baer in that case decided for fair use.

I cannot over-emphasize either the role of libraries in this case nor the support that both judges expressed for libraries and for their promotion of "progress and the useful arts." Chin refers frequently to the amicus brief (PDF) presented by the American Library Association, as well as the conclusions in the HathiTrust case. Both judges clearly admire the mission of libraries, and it seems clear to me that the educational use of the materials by libraries was seen to offset the for-profit use by Google. In fact, Judge Chin reverses the roles of Google and the libraries when he says:
"Google provides the libraries with the technological means to make digital copies of books that they already own. The purpose of the library copies is to advance the libraries' lawful uses of the digitized books consistent with the copyright law." p. 26
In those terms, Google has simply helped libraries do what they do, better. Google's digitization of the library books is thus a public service.
"Google Books helps to preserve books and give them new life. Older books, many of which are out-of-print books that are falling apart buried in library stacks, are being scanned and saved." p. 12
Note that Google and the libraries (in HathiTrust) are exceedingly careful to stay within the letter of the law. Google's snippet display algorithm is rococo in design, making it literally impossible to reconstruct a book from the snippets it displays. So much so that it would probably take less time to re-scan the book at home on your page-at-a-time desktop scanner.

The full impact of this ruling is impossible (for me) to predict, but there are many among us who are breathing a great sigh of relief today. This opens the door for us to rethink digital scholarship based on materials produced before information was in digital form. 

I do have a wishlist, however, and at the top of that is for us to turn our attention to making the digitized texts even more useful by turning that uncorrected OCR into a more faithful reproduction of the original book. While large-scale linguistic studies may be valid in spite of a small percentage of errors, the use of the digitized materials for reading, in the case of those works in the public domain, and for listening, in the case of works made available to VIPs (visually impaired persons), is greatly hampered by the number and kinds of errors that result. In a future post I will give the results of a short study that I have done in that area.

See all my posts on Google Books