Sunday, August 23, 2009

Academic publishing as a percentage of Google Books

A group representing University of California faculty have expressed their concerns about the Google/AAP settlement in a letter to the presiding judge. In that letter they state one of their concerns as:
"Specifically, we are concerned that the Authors Guild negotiators likely prioritized maximizing profits over maximizing public access to knowledge, while academic authors would have reversed those priorities."
The next sentence says:
"We note that the scholarly books written by academic authors constitute a much more substantial part of the Book Search corpus than the Authors Guild members’ books."
I was disappointed that they didn't include any data to support the "substantial part" statement, and think that their letter would have been stronger if they had. (I am presuming that they meant "substantial" in a quantitative way, rather than qualitative. The latter would be hard to support.)

Edward Betts of the Open Library did an experiment in identifying publishers in the OL data. Because the way publisher names are recorded in bibliographic data, he used ISBN publisher prefixes, where available, to bring together different forms of the name. He posted his results on the OL blog. The post links to his files. His data shows counts for each (presumed) individual publisher.

I mentioned in a comment to Edward's blog post that it was interesting to me that a university press (Oxford UP) turned up in the #1 spot as the publisher with the greatest number of books in the OL. As a matter of fact, out of the top 20 publishers, five are university presses (UPs), and they make up over 1/4 of the books in that group. (Download a tab-delimited, ranked version of the data, but be sure to look at Edward's detailed data to understand what makes up each publisher entry.)
# of books records published by the top 20 publishers: 1,935,327
# of books in the top 20 by University Presses: 577,323
Out of the top 100, the UPs make up a little less than 25% of the file. I'm only including those presses with "University" in their names, meaning that the figure doesn't include Academic Press, Elsevier, Scholastic, etc., which primarily publish the output of academic writers.

This study of OL publisher data was just experimental, so these figures should be taken with a grain of salt. However, this shows that there is an interesting study to be done, if it can be done, quantifying the relative roles of academic and commercial publishing. Given that Google is digitizing books in university libraries, the tendency toward academic publications should be quite strong. (Note that OL has taken its records from the Library of Congress, online book sales, and some libraries, and probably is less heavy on academic presses than Google Books will be.)

The UC faculty's concern that the interests of academic writers are not well-served by the Author's Guild is compelling to me. I hope the judge takes it seriously.

No comments: