Google representatives claim that their data comes from libraries and from other sources, but it is easy to show that Google is not including the library's bibliographic record in GBS. It might just be seen as a short-sighted decision on their part not to keep all of the data from the MARC records supplied by the libraries. After all, which of these do you think makes the most sense to the casual reader:
12 pagesHowever, there is some evidence that Google is missing parts of the library bibliographic record. Here are some examples of subjects from GBS and the records from the very libraries that supplied the works:
12 p. 27 cm.
Indians of North America
Indians of North America -- Languages.
Indians of North America -- California
Indian baskets -- North America
This is the same pattern that appeared in the records released by the University of Michigan for their public domain scanned books -- only the $a of the 6XX field was included. (I wrote about this: http://kcoyle.blogspot.com/2008/05/amputation.html). Many other fields are also excluded from those Michigan records, and one has to wonder if the same was true of the records received/used by Google.
I know that it is possible to retrieve the full library records for the books because the Open Library is using this technique to retrieve bibliographic data for the public domain books scanned by Google. Google is obviously capable of doing this, yet chooses not to.
This leaves us with a bit of a mystery, although I think I know the answer. The mystery is: why would Google only use limited metadata from the participating libraries? And why won't they answer the question that I asked at the Conference: "Do you have a contract with OCLC? And does it restrict what data you can use?" Because if the answer is "yes and yes" then we only have ourselves (as in "libraries") to blame. And Nunberg and his colleagues should be furious at us.