Sunday, May 10, 2009

Walt Crawford should read the document

In his March, 2009 Cites & Insites, Walt Crawford does a roundup of comments on the Google/AAP settlement, and gets very agitated when reviewing some of my posts. I'm used to that. But agitation tends to cancel out reason, and Walt gets some things wrong that he might have understood better if he had kept a clear head.

In response to my criticism that Google is digitizing without regard to collection building, Walt says:
"I don’t know of any big academic library or public library that’s a single disciplinary collection—or, realistically, a set of well-curated collections. "
I'd like to hear from academic librarians on this one. My understanding was that an academic library is INDEED a set of well-curated collections.

"I don’t remember public universities admitting to substantial costs in cooperating with Google."
What's the cost? Dan Greenstein estimated $1-2 per book. Cheap, but still considerable for a library scanning millions of books. The cost is primarily in staff time, shelving and reshelving books. Under this agreement, there is also the cost of meeting the security requirements that are imposed. (That's in Appendix D) These requirements, which are possibly quite reasonable, will have a greater cost than what most libraries do today for digital materials, and will be one of the primary reasons why some libraries do not contract to receive copies of the digitized items. (Note that some of the potential library partners are working hard to collaborate on the Hathi Trust, which does appear to meet the standards of the agreement; others, however, have decided that they will not attempt to store digital copies.)

In a post I argued that had libraries gone ahead and digitized their own collections (for the purposes of indexing and searching), that this probably would have been considered fair use.

"Well…this is not a judicial finding. I find it unfortunate that Google didn’t fight the good fight, and I think it will make things much harder for another commercial entity to attempt similar digitization and use—but I don’t see that library use of “their own materials” has changed in any way."
Not of their hard copy materials, but legal minds think that this changes the landscape for digitization and the use of digitized materials, even closing some options that might have been available before.
"The proposed settlement agreement would give Google a monopoly on the largest digital library of books in the world. It and BRR, which will also be a monopoly, will have considerable freedom to set prices and terms and conditions for Book Search’s commercial services.... If asked, the authors of orphan books in major research libraries might well prefer for their books to be available under Creative Commons licenses or put in the public domain so that fellow researchers could have greater access to them. The BRR will have an institutional bias against encouraging this or considering what terms of access most authors of books in the corpus would want." Pam Samuelson
And to my statement:
"The digitization of books by Google is a massive project that will result in the privatization of a public good: the contents of libraries. While the libraries will still be there, Google will have a de facto monopoly on the online version of their contents."
Walt first prefaces it with:
"I take issue with the very first sentence, as I’ve taken issue consistently with the same claim by others with even higher profiles than Coyle (who are even less likely to ever admit they could be mistaken)."
Well, it would have been nice if he had said who they are. But thanks for letting me know that you consider me a "lower profile" person, Walt. He goes on to say:
"Nonsense. Sheer, utter nonsense. The libraries and contents will still be there. OCA will still be there. I’m sorry, but this one just drives me nuts: It’s demonization of the worst kind and an abuse of the language."
Well, I'm not sure how this abuses language, but there is general agreement that Google gets a monopoly... at least on out-of-print books, which is the vast majority of books in libraries. (Not on public domain books, which is what the OCA digitizes, but anyone can digitize public domain books.) So although the libraries and their contents will still be there, and can be used in hard copy as they are today, no one but Google can digitize the in-copyright works without incurring liability. So "monopoly on online version of their contents" is a factual statement, if you understand that public domain is public domain. (Note, this settlement agreement is extremely complex, with some real zingers hidden in its 134 pages. It's not possible to cover it all in a blog post, so anyone who is interested really needs to read the document itself, painful as that process is.)

In terms of preservation and longevity concerns, Walt asks:
"Won’t the fully-participating libraries have digital copies? I can’t think of institutions with better longevity."
To begin with, only fully participating libraries will have digital copies, and we don't yet know how many libraries will choose that option. Other libraries, even those that are only allowing Google to digitize public domain books, do not get to keep copies of the digital files. (Not only that, public domain libraries that have been cooperating with Google have to delete all of their copies of the files that they hold today, as per this agreement. See Appendix B-3.) The only party with copies of all of the files will be Google.

There are statements in the settlement about what happens if Google "fails to meet the Require Library Services Requirement" or simply decides not to continue. I refer you to page 84 of the settlement, and hope that someone can make sense out of it. The way I read it, libraries can then engage a third-party provider, who will receive the files from Google.

The key thing here is that even in the event of the failure of Google, libraries are not allowed to make uses of their own scans, such as those that are permitted to Google by this settlement. The restriction to "computational uses" and some other minor uses stands, even in that eventuality.

When I say:
"Google should be required to carry all digital Books without discrimination and without liability."
Walt replies:
"You mean “all digital books that Google’s scanned”? I suspect Google wouldn’t argue with this."
That is exactly what I mean, and Google does indeed argue with it. As a matter of fact, the settlement only obligates Google to provide access to at least 85% of the books it scans. That "access" refers to the subscription service that will be available to libraries and other institutions. The settlement says:
"Google may, at its discretion, exclude particular Books from one or more Display Uses for editorial or non-editorial reasons." p.36
That's followed by an affirmation of the "value of the principle of freedom of expression," which I must say rings a bit hollow in this context. Google has to notify the Registry if it has excluded a book, and to provide a digital copy of that book to the Registry. The Registry can then seek out a third party to provide services for excluded books. Here, however, is James Grimmelmann's concern on that front:
"The second is that no one besides the Registry might ever find out that Google has chosen to de-list a book. If the Registry doesn’t or can’t engage a replacement for Google, the book would genuinely vanish from this new Library of Alexandria. Perhaps that should happen for some books, but decisions like that shouldn’t be made in secret. When Google choses to exclude a book for editorial reasons, it should be [R13] required to inform the copyright owner and the general public, not just the Registry. "
What might Google exclude? Perhaps very little, but at the ALA panel in Denver in January, 2009, Dan Clancy of Google gave an off-the-cuff remark that, as I recall, had the word "pornography" in it. Given the recent embarassment of Amazon when it had to face the fact that many of its best sellers are rather salacious in nature, I can imagine Google also developing concern about the visibility of the texts that make us uncomfortable.

There are a lot of legitimate reasons for concern about this proposed settlement. And I don't think that anything that I have said is "nonsense."


Jerome McDonough said...

I've worked in academic libraries and am now teaching librarians-to-be. And I can tell you that one of the things we spend quite a bit of time on in one of the first introductory course in library school is what constitutes a collection and how you should go about determining what should go into one (or go out). Academic libraries are, in fact, well-curated collections, at least if the librarians are doing their jobs right, and I haven't heard anyone accusing the subject librarians at Harvard, UC Berkeley, etc. of being a bunch of hacks. I supposed you could always argue they should be curated better, but take that up with the agencies determining libraries' funding levels.

Eric said...

With your low-profile comments on Walt's lengthy comments on the comments on the Google Book Search settlement agreement, it's becoming comparatively easier to read the actual settlement agreement. I've been toiling in obscurity to do so, and I think that one source of the contention between you and Walt is that there is imprecision in discussions about the fact of a monopoly and the origin of that monopoly. While Google will have a certain monopoly on books that it has digitized, that monopoly will come from several factors:
1. Copyright law. All Copyrights are grants of monoploy to the rights holders. The rights holders are free to license any of those rights to Google and only to Google.
2. Unique investment. Only Google will have made the large monetary investment to both copy books in libraries and also to bear the burden of infringement liability.
3. Licenses provided for by the settlement agreement. While it is true that the settlement agreement gives certain rights uniquely to Google, it does not prevent Google's competitors from obtaining the same rights. I've written about this elsewhere.

Karen Coyle said...

Thanks, Jerry, for your input. I needed that confirmation from someone who knows.

Eric, it is commonly thought that Google will have a de facto monopoly. (See Samuelson link in my post - section "Google's New Monopoly." Also, see Grimmelmann's talk called "Ends, Means, and the Future of Books"). Note also that the Justice Department is now looking into the settlement regarding possible anti-trust issues. Samuelson also considers the Rights Registry a monopoly.

I listened to this all being discussed among lawyers at a meeting at the Samuelson center, and I admit that I cannot explain it entirely, being also NAL. But I have often heard folks who I consider to be knowledgeable say that the only way for another company to get into a position to compete with Google is to 1) start scanning in-copyright books 2) get sued by AG/AAP in a class action suit 3) settle with them in the same way that Google has. In other words, in the analysis of these legal folks, the BRR does NOT have the right to give other companies the ability to copy works. Only the court can do that. Since this is an interpretation made by others, I can't confirm or deny it. All I can say is: keep reading what's coming out about it.

Karen Coyle said...

Eric, one thing I forgot to mention -- all of this is complicated by the fact that this is NOT a contract, but a class-action lawsuit. As the latter, it creates a situation in law that cannot be recreated with contracts -- because by its definition, it covers the entire class of rights holders whether or not they are known, alive, or in agreement. Sure, anyone else could theoretically sign contracts with all of the rights holders he could find -- contracts that allow him to digitize the books. But as we know, that number of known rights holders is much smaller than the number of unknown rights holders (e.g. the orphan works problem), and the cost to find them all and make the deal would be prohibitive.

Google doesn't need to find anybody. Google, and only Google, can digitize at will, and rights holders have to make themselves known to the registry in order to get payment for use of their materials. This is the opposite of how copyright law usually works -- opt-in rather than opt-out.

Eric said...

The question about reproducing Google's orphan works murkopoly is an interesting one. I accept that duplicating the class-action suit would be required, but how bad is that, exactly? It would certainly be much less expensive to settle because the copier could contrive to avoid actual damages and settle for an injunction; the BRR is allowed to participate as long as it does not offer a better deal than what Google gets.

Karen Coyle said...

Eric, I don't know how hard it would be to replicate Google's position. It would all depend on a representative group of rights holders willing to launch a class-action suit, rather than simply suing the offender for specific copyright violation. I also don't know if it's illegal to make a deal with someone to sue you -- but it sounds iffy to me. So it seems to depend on what the AG/AAP wants.

Any more than this, and we need a lawyer. ;-)

bowerbird said...

the internet archive is the only other
player in the arena at this time, and
they are opposed to the settlement...

that should tell people something...


Unknown said...

Karen, I've been following your comments on this for a while, and agree that the de facto monopoly Google appears to be getting is a big concern. I really appreciate the clarity of your comments, and agree that the de facto monopoly Google is getting as a result of the class action suit is very concerning.

While it is theoretically possible for another organisation to begin its own large-scale scanning project, I understand that some libraries feel that 'once is enough', and are saying that they'll refuse to let material be scanned more than once. While this isn't a problem for books that are widely held, it is a significant issue for unique material. All this suggests to me that the Google Books Settlement needs to be scrutinised very closely, to prevent having the digital version of our academic and cultural heritage locked up in a single for-profit company.

Eric said...

In reading Randy Pickers paper on the book search settlement, I see that in section IV.B. he suggests two ways that the "initial monopoly" on orphan works might be mitigated. The MFN clause could be made symmetrical, so that other entrants would have the same deal Google gets, or the BRR together with the court could be authorized to separately grant licenses to use the orphan works.

Would either of these measures address your concerns, Karen?

Karen Coyle said...

Eric, I'll try to get a chance to read that document. I took a look at the abstract, and there's a caveat about non-profits that I'd like to understand better. Allowing competition would make me more comfortable, but I think how to make that competition possible (and how broad it can be) is tricky. I would prefer that libraries and other non-profits be able to become players, but a for-profit/non-profit mix may not work well. I have this vague idea that orphan works could be treated like public domain ones, and would therefore be more suited to non-profit exploitation for public purposes. Works with rights holders fit into the commercial domain. I have no idea of the legality of any of what I've said!

waltc said...

Now that I'm finally back on the air and can devote some time to something other than moving, I've responded here.