Wednesday, January 28, 2009

Google: What's in it for libraries?

(This is a version of the talk I gave on the Google panel at ALA in Denver. PDF for printing.)

The title of this panel is: Google/AAP Settlement: What's in it for libraries? What I can say with certainty is that we don't really know. And I don't really know. What I have to work with is the settlement document, all 140+ pages and 15 appendices, which is the same information that is available to you. But in spite of its size, that is just the tip of the iceberg. It doesn't reveal the discussions that took place nor the reasons behind the decisions that were made. Some people know much more because they were involved in the negotiations. However, everyone who was involved is sworn to secrecy and can't speak about it. This greatly limits what we can and cannot know about the potential effect on libraries. Although I do not have answers, I do have many questions.

I'm going to address the question of libraries that are not Google partners. All of us who are not partners, who are not involved in the scanning, are potential customers. For the Google partners, the settlement includes in an appendix examples of the contracts that they will be asked to sign. For everyone else, it is important to note that you have no contract with Google. There is some information in the Google/AAP settlement document about aspects of the product that Google can provide, but it is far from the whole picture. Any input that you will have into what this settlement means to you will take place as you negotiate your contracts for the Google book products, should you choose to subscribe. What I will cover in the next few minutes are some things that you should be aware of as you consider becoming a Google book product customer.

The first thing you should ask, and the bottom line, is: Does this product serve my users? At the moment, the book search product is an idiosyncratic offering of digitized texts, but nothing that would resemble a curated collection. Can the library's patrons benefit from the service? Will it meet their research needs?

Next you should consider the quality of the product. The obvious areas are the quality of the scans and of the OCR, but also the use of metadata, the search capabilities, and the ability to integrate the product into library practices.

There are numerous legal requirements on libraries, especially publicly funded libraries, that we always must be aware of in our relationships with vendors. A key one of these is privacy. We know that Google's primary business model has been that of delivering customers to advertisers, and obviously we cannot participate in such a model. Those of us in public institutions are bound by state laws to ensure the confidentiality of the use of our materials, and we generally extend that to outside services contracted by the library. The only mention of confidentiality in the agreement is the confidentiality of rights holders.

Our services also must be ADA compliant. Beyond that I would say that for public libraries it is implicit in your mandate that you provide equal access to all. This generally excludes any services that require payments by end-users (and note that there is a statement in the settlement that users of the free subscription that will be available to public libraries may need to make royalty payments for any printing.)

Publicly funded institutions may be bound by the first amendment, and all libraries are champions of intellectual freedom. We know that Google does censor other products, and that publishers withdraw controversial books. If nothing else, we need those activities to not take place secretly.

Which brings me to another issue: transparency. The entire settlement process has been an exercise in the lack of transparency. For those of us who were not involved, it came as a surprise when the agreement was released and we found out that it was the result of two years of secret negotiations. This is normal in the for-profit world, but for those of us in publicly-funded institutions, transparency of our operations is a legal and moral obligation. In addition, the secrecy around the workings of the product make it very hard for us to help users who aren't finding what they need. We don't have to know the secret page rank algorithm, but until the settlement document came out Google would not even reveal how many scanned books it had in its database (7 million). Should the database be offered to subscribers, we should insist on knowing what it contains and what features it will have, so that we can assess its value for our users. I also want to say that we do not want to be in a position of getting information about the product that we cannot share with our users, so becoming party to secrets is not an option.

The last question that I'll bring up here is that of sustainability. Libraries have been in existence for thousands of years, and modern libraries in this country have a history that is measured in centuries. Google has been in existence for about 15 years. Do any of us expect that Google will be around in 200 years? What are the plans for this content should Google cease to exist, or decide it doesn't want to continue to support this product? Some libraries will have copies of scanned books, but is there a plan to place in escrow all of the scans? It's not just a question of the scans, however, because they will be in dark archives. What happens to the service, the user interface?

I also want to say a few words about the so-called "free subscription" that will be offered to publicly funded libraries. We need to look this gift horse in the mouth, if nothing else to make sure that it isn't a Trojan horse. We have very little information today about the nature of this particular product, other than that it will be reduced in functionality from the paid subscription, and it is stated in the agreement as being "one terminal per library building." Remote access to this product is not allowed, users must be physically at the library. Clearly, for any medium or large libraries, one access will not be sufficient. It is also clear that "free" has its costs, and in this case one cost will be the management of a very scarce resource. While this free service is often touted as an act of great generosity, in my more cynical moments I see it as a clever act of "product placement." Where best to put a demo version of your product than in the institution that is most frequented by potential customers: book readers.

I have no idea what the future will bring, but I can imagine a wide range of possibilities. At one end, I see the possibility that the Google book product turns out not to be profitable, that it doesn't gain enough subscribers and it doesn't sell enough out-of-print books to make it worthwhile. Google drops the product, as it has dropped other products that just didn't pan out. The other end is a scenario where the product is highly profitable, either through sales or advertising revenue, and Google continues to make deals with libraries to scan books until it is parallel in content with the entire system of libraries in this country. Parallel, but highly capitalized, ubiquitous, online. At that point we would have a privatized version of the library system, with different goals and values, and no public oversight.

You and I know that Google is not a library, but we also know that our users don't understand that difference. And I'm pretty sure that some city managers with budget problems will not understand that difference, but they do know what it costs them to maintain a public library. I hope that Google understands that its own ambitions can have far-reaching effects on public institutions, but I don't know if there is any way to mitigate the danger that Google can pose to those institutions.

To my library colleagues, I have some advice. We have to be willing to throw off the past and learn to innovate. This is a new information world, and we must be full participants in it. To be visible we must embrace the Web as our data platform, and to do that we must reject any attempts to prevent us from participating openly on the Net.

Monday, January 26, 2009

A start at a questions list

From my notes, here are some questions regarding the Google/AAP settlement that came up during the panel. Please add your own as comments, and we will try to find some definitive place to put these where everyone can contribute.

... in no particular order...

  1. What happens to the current contracts that some libraries have with Google?
  2. Will the subscription services include any features that would gather user information, such as anything that takes an email address?
  3. Will the subscription service include the commercial features, such as buying copies of out of print books?
  4. What will the capabilities be for public domain books? Download?
  5. Will print-on-demand be one of the possiblities?
  6. Will there be advertising on any of the products?
  7. How will pricing be determined? FTE? Amount of use?
  8. How often will pricing change? Will it be possible to lock in a price for a number of years?
  9. What plan is there for termination of the service by Google? Will all scans be escrowed? What happens to the service itself?
  10. Who can access the registry? What, if any, part of it will be public access?
  11. Copyright law has fairly broad allowances for educational and classroom use. Will this be replicated in the contracts with educational institutions?
  12. Can my library buy just those books it needs to round out its collection?
  13. What will the services be for public domain books - like printing, mashing together content from multiple books, etc?
  14. Why is there no mention of school libraries in the settlement? Were they purposely excluded?
  15. Is it true that OCLC network organizations are not allowed to negotiate as consortia under the settlement? Why is that?
  16. What is the status of works that have been scanned by a library or some other institution, but are contributed to Google Book Search? Do they have the same restrictions as books scanned by Google?
  17. Other than allowing or disallowing advertising, do rights holders have any say over the presentation of their works (e.g. use of covers, ranking, metadata?)
  18. Can a library combine its LDC database with any other digital copies for the purposes of non-consumptive research?
  19. Where does the book metadata come from? What data does it include?
  20. Does the definition of periodical (1.102, p. 13) include yearbooks (e.g. almanacs) and reference works (e.g. Physician's Desk Reference)?
  21. Can Google turn the institutional subscription service over to a third-party vendor at will?
  22. If Google excludes a book, will that information be publicly available in the registry? (p. 36)
  23. How is "government" defined on p. 42 in the pricing bands?
  24. If Google determines that book is in copyright, but perhaps is not, who can contest this?
  25. Can public libraries subscribe to the institutional subscription? If so, what is the cost basis? (The document says "FTE" but public libraries don't have FTE). (p. 42)
  26. If someone claims a book in the registry that they in fact have no rights over, how is this detected? Is there a penalty?
  27. Who is responsible for the accuracy of the registry?
  28. Will libraries or institutions be able to create collections within the GBS? That is, to select and mark particular items as part of a bibliography or reading list.

Sunday, January 25, 2009

The ALA Google Panel

On Saturday at ALA, the ALA Washington Office Committee on Copyright held a panel about the Google AAP settlement. Panelists were: Dan Clancy, of Google, Paul Courant, of University of Michigan, Laura Quilter, librarian and lawyer, and me. Both Clancy and Courant were involved in the negotiations with AAP and therefore have inside knowledge of the discussions and ins and outs, but both are also bound by a non-disclosure agreement, so there is only so much they can say.

What I mainly learned at this panel is that we (librarians and public) need much more information about this settlement, and that information is not in the settlement document. I also learned that we are unlikely to get that information until after the settlement is approved (assuming it will be approved, but who knows).

Some snippets, and then later I'll write up my talk (of which I only gave a very brief rendition at the panel) and post it later. (Note, I may add others during the day as I remember them.):
  1. The role of the Registry is to represent the rights holders.
  2. The rights holders appear to be quite nervous about libraries.
  3. Although anything can potentially be negotiated, Google will do the negotiating with the Registry for any requests for product features.
  4. At the moment, the following are not allowed: library purchases of books, library lending of books, use of books for ILL.
  5. Also not allowed is remote access for public library subscriptions to the service -- the problem with this, in the eyes of the rights holders, is that public library services (because they serve the whole public) would compete with the public as customers for purchase of the books.
  6. Google itself is not thrilled about becoming a library vendor, because it recognizes that it's not a big bucks market and it doesn't fit into the Google business model well. (At one point Dan mentioned that getting checks for $5000 from public libraries isn't very appealing.)

Saturday, January 24, 2009

Obama administration embraces Creative Commons

The copyright notice on the Obama Whitehouse site states:

Pursuant to federal law, government-produced materials appearing on this site are not copyright protected. The United States Government may receive and hold copyrights transferred to it by assignment, bequest, or otherwise.

Except where otherwise noted, third-party content on this site is licensed under a Creative Commons Attribution 3.0 License. Visitors to this website agree to grant a non-exclusive, irrevocable, royalty-free license to the rest of the world for their submissions to Whitehouse.gov under the Creative Commons Attribution 3.0 License.
Wow, that feels good!

Wednesday, January 14, 2009

OCLC pushes back policy to fall, 2009

OCLC has just announced that it is pushing back the date on which the new record use and transfer policy will take effect. The actual new date isn't known, but the announcement says:
In order to allow sufficient time for feedback and discussion, implementation of the Policy will be delayed until the third quarter of the 2009 calendar year.
OCLC will form a "review board" to solicit info from members and others, and to advise the OCLC board of trustees about the policy. Jennifer Younger will chair this committee.

This delay is welcome, but I am dubious that a review board would be able to convince the trustees that OCLC must welcome open access to bibliographic data. Minor tweaks to the policy are not going to make much of a difference, and I doubt that any "advice" is going to force the board to do an about-face.

Those of us who promote open access must use this time wisely. First, we need to get some solid legal advice. It's clear that OCLC can propose any kind of conditions in a contract and hope to get signers; it's less clear that OCLC can impose a contract on members 1) without their explicit agreement 2) that covers data created before the contract becomes valid 3) that binds third parties to the contract. Next, anyone who has bibliographic data should release it "into the wild" as quickly as possible. Once the data is circulating, it will not be possible to withdraw it. One solution is to create database dumps and to upload these to the Internet Archive. They will be there for downloading by others, and some of the data may end up in the Open Library. Assuming that bibliographic records cannot be covered by copyright, all of this data ends up in the public domain to fuel innovation and creativity.

Note: if you are preparing a data dump, my advice is:
  • use a standard format (MARC21, MARCXML, UNIMARC, etc.).
Be sure to include in each record fields that give:
  • your local record ID (MARC 001)
  • something that identifies the source of the record (your system or institution) (MARC 003)
  • the version date (either the last date the record was updated, or the date of the data dump) (MARC 005)

Saturday, January 10, 2009

Google Books and Social Responsibility

The digitization of books by Google is a massive project that will result in the privatization of a public good: the contents of libraries. While the libraries will still be there, Google will have a de facto monopoly on the online version of their contents.

While regulation of industry has fallen out of favor in these 'free market' times, we do have a history of making particular demands on companies whose products and services have an important social impact, such as broadcast television or telephone services. This is especially the case where one company has a monopoly on the product area. There are some functions that are just too important to be left to the interests of one company or to market forces, and so we regulate them to protect the interests of civil society.

If I were in a position to require social responsibility of Google and its digitization program, these would be my terms:

Sustainability

While Google is a hot company today, it may not last forever. Actually, it probably won't be around for the 200-odd years that have been covered by the libraries it is working with. To protect against the loss of the digitized books should Google either disband or decide not to continue the Books product line, Google should be required to place the digital copies in escrow, where they will be preserved. My preference would be for the escrow body to be a public institution (or a group of such institutions) that has proven longevity and stable public support.

Intellectual Freedom

The First Amendment prevents the government from censoring its citizens, and we rely heavily on this key right as the basis for many of our freedoms. Private companies are not bound by the First Amendment; as a matter of fact, in law they are protected by it as honorary persons. This means two things: first, that private companies can (and do) censor their products, and second, that they can be held liable for any social harm that is perceived if they do not censor. Thus publishers can be held liable for errors of fact in the books they produce, or a company that promises a 'child friendly' web site can be held liable if pornography slips through their filter.

I want Google to have the same right to deliver books to users that publicly funded libraries do. How this could be worked out in terms of law and liability I must leave to others to determine, but what I am thinking of like the of common carrier model that has been used for communications companies. Basically, Google should be required to carry all digital Books without discrimination and without liability.

Privacy

Public libraries are bound by state laws to protect the privacy of their users. This protection generally takes the form of enforced confidentiality over any records of library use. This is, in a sense, the other side of the intellectual freedom coin: people are only free to access the speech of others if they are guaranteed that they will not be watched or tracked, and that their information access will not be revealed to others. There are no laws that bind private companies to this same standard, but companies are held to their own promises of privacy to their users. Google should develop a particularly strict privacy policy for the Books product, and should be willing to allow auditing of its practices so that users can trust the company's practices. Libraries themselves will insist on such a guarantee if they are to include the Book product in the services they provide to their own users.

Transparency

One of the things that has greatly frustrated librarians in their attempts to use Google products is the lack of information about decisions that are made by the company. Already there have been cases of books being withdrawn from full view without notice, making it hard to rely on the product.

Once we have licensed this product, we have a 'deal' with Google that is different to the open endedness of the free Google products. Part of this deal needs to be that we can be informed about the product we are licensing. If the Book product will be licensed by educational institutions, it has to be possible for those institutions to know the status of works and to understand what decisions can be made. Transparency also implies a process for appeal or at least discussion with the vendor about decisions, because those decisions will affect the value the product has in our environments.

... and probably more

This is just a short list, and this is a blog post, not a final thesis. I present these ideas primarily to begin a discussion about the impact of the Google Books product on the public and in particular on public institutions like libraries and universities. The settlement agreement goes into quite a bit of detail about the business case of the Books product, but says nothing about customer needs or support. As potential customers, libraries have a social responsibility to their users to negotiate the license terms with freedom in mind.