Some librarians were involved in the settlement talks. The only one I have found so far who has come out about this is Georgia Harper. The librarians were working under a non-disclosure agreement (NDA), and therefore will not be able to reveal any details of the discussions. I have heard statements from others who I believe were privy to the negotiations, and they all seem to feel that the outcome was better for libraries due to the involvement of members of our "class." (Note that Google and AAP had high-end lawyers arguing their side, and we had hard-working librarians. I don't know how many of "our" representatives were also lawyers, but you can just imagine how greatly out-gunned they were.) Unfortunately that doesn't change my mind about the bait and switch move.
Google Books as Library
Some have begun to refer to Google Books as a library. We have to do some serious thinking about what the Google Book database really is. To begin with, it's not a research collection, at least not at this point. It's really a somewhat odd, almost random bunch of book "stuff." As you know, neither Google nor the libraries are selecting particular books for digitization. This is a "mass digitization" project that starts at one end of a library and plows through blindly to the other end. Some libraries have limited Google to public domain works, so in terms of any area of study there is an artificial cut-off of knowledge. Not to mention that some libraries, mainly the University of California, have been working with Google primarily to digitize books in their two storage facilities; that is, they have been digitizing the low use books that were stored remotely.
So the main reason why Google Books is not a library is that it isn't what we would call a "collection." The books have not been chosen to support a particular discipline or research area. Yet it will become a de facto collection because people will begin using it for research. Thus "all human knowledge" becomes something more like the elephant and the blind man: research in online resources and research that uses print materials will get very different views of human knowledge. (This is not a new phenomenon. I wrote about this in terms of some early digital projects I was involved in.) One of the big gaps in Google Books will be current materials, those that are still in print. Google will need to convince the publishers that it can increase their revenue stream for current books in order to get them to participate.
Subscribing to Google Books: Just Say No?
Beyond the (undoubtedly hard-won by library representatives) single terminal access in each public library in the US, libraries will be asked to subscribe to the Google Book service in order to give their users access to the text of the books (not just the search capability). This is one of the more painful aspects of the agreement because it seems to ignore the public costs that went in to the purchase, organization, and storage of those works by libraries. (I'm not includng privately funded libraries here, but many of the participants are publicly funded.) The parallels with the OCLC mess are ironic: libraries paying for access to their own materials. So, couldn't the libraries just refuse to subscribe? Not really. Publicly funded libraries have a mission to provide access to the world's intellectual output in a way that best serves their users. When something new comes along -- films on DVD, music on CD, the Internet -- libraries must do what they can to make sure that their users are not informationally underpriviledged. Google now has the largest body of digitized full text, and there will be a kind of "information arms race" as institutions work to make sure that their users can compete using these new resources.
The (Somewhat Hidden) Carrot
I can't imagine that anyone thought that libraries and Google were digitizing books primarily so that people could read what are essentially photographs of book pages on a computer screen. Google initially stated that they were only interested in searching the full text of books. While interesting in itself, keyword searching of rather poor OCR text is not a killer app. What we gain by having a large number of digitized books is a large corpus on which we can do computational research. We can experiment with ideas like: can we follow the flow of knowledge through these texts? Can we create topic maps of fields of study? Can we identify the seminal works in some area? The ability to do this research is included in the agreement (section 7.2(d), The Research Corpus). There will be two copies of this corpus allowed under the agreement, although I don't see any detail as to what the "corpus" will consist of. Will it just be a huge file of digitized books and OCR? Will it be a set of services?
I have suspected for a while that Google was already doing research on the digital files that it holds. It only makes sense. For academics in areas like statistics, computer science, and linguistics, this corpus opens up a whole range of possibilities for research; and research means grants, and grants mean jobs (or tenure, as the case may be). This will be a strong motivation for institutions to want to participate in the Google Book product. Research will NOT be limited to participants; others can request access. What I haven't yet found is anything relating to pricing for the use of the research collection, nor if being a participating library grants less expensive access for your institution. If the latter is the case, then one motivation for libraries to agree to allow Google to scan their books (at some continuing cost to the library) will be that it favors the institution's researchers in this new and exciting area. Full participant libraries (the ones that get to keep the digital copies of their works) can treat their own corpus as research fodder. The other costs of being a full participant are such that I'll still be surprised if any libraries go that route, but if they do I think that this "hidden carrot" will be a big part of it.
There's lots of good blogging going on out there on this topic. It needs a cumulative page to help people find the posts. Please tell me you have time to work on that, so I don't have to take it on! (Or that it exists already and I've missed it.) (The PureInformation Blog has a good list.)
Note: the Internet Archive/OCA may take this on. I'll post if/when they do.