Saturday, November 14, 2009

Amended Google/AAP Settlement

The amended settlement has been issued (the best way to see the changes is in the redline version). I will summarize here the changes that I see as having the greatest impact on libraries and on the public. For legal issues, I suggest James Grimmelmann's blog. For business issues, probably the NY Times and Wall Street Journal.

Foreign Works Mostly Excluded

Undoubtedly due to the many complaints from foreign rights holders, the settlement now only includes (oddly enough) US, UK, Australian and Canadian works. This would include, as I interpret it, translations of non-USetc works published in those four countries. This greatly changes the value of the institutional subscription for higher education, as well as the value of the 'research corpus' (essentially a database of the OCR'd texts that researchers can use for computational research).

Since we know that information seekers prefer accessing works online rather than in hard copy, I anticipate that the online service will be very popular. But it will contain almost exclusively these Anglo-American products, a narrow swath of the intellectual output of the planet. As it is, too many Americans are unaware of the world outside of those Anglo-American borders. This will just exacerbate that problem. It could change the content of of education and research. As I've said before, availability is a significant determinant of what intellectual materials people use in their research.

Particular to Libraries

In general, the sections on libraries (both participation and use of the digital copies) remain unchanged. There are a few minor changes, some of which are puzzling.

Public Libraries

The statement about the free access for public libraries has been changed from:
in the case of each Public Library, no more than one terminal per Library Building

to
in the case of each Public Library, one terminal per Library Building.; provided, however, that the Registry may authorize one or more additional terminals in any Library Building under such further conditions at it may establish, acting in its sole discretion and in furtherance of the interests of all Rightsholders.
So it leaves the options open for giving some public libraries additional (free?) access. Still, there is no information on whether or how public libraries could subscribe in a way that would allow them to fully serve their communities.

Microforms

The definition of "books" that could be digitized originally included microforms. The word "not" has been added:
hard copy (not including microform)
No idea why, but perhaps a look at the comments will reveal one from UMI or some other party related to microforms.

[Found it: The ProQuest letter states that dissertations should NOT be included as they are controlled through ProQuest's dissertation service. The letter mentions that some dissertations are in microform format, but that today many are available as print-on-demand or online. Although microforms were excluded, p. 327 of the redline document states:
"
What Material Is Covered?
"Books” include in-copyright written works, such as novels, textbooks, dissertations, and other writings...".
So ProQuest did not get what it asked for.]


OCLC Networks

The original settlement had a strange exception that removed OCLC networks from the definition of "consortium":
"Institutional Consortium” means a group of libraries, companies, institutions or other entities located within the United States that is a member of the International Coalition of Library Consortia with the exception of Online Computer Library Center (OCLC) - affiliated networks.
That exception has been removed. I would love to know why it was there in the first place, but can only assume that one or both of these requests came about because of participation by OCLC in the settlement discussions.

[Note: I discovered that Lyrasis and Nylink filed an objection about this exception, which may be why it was removed. Their analysis was that it had come from OCLC and gave OCLC the ability to manage competition by determining which organizations would be excluded from participating in the business of brokering services for libraries. They assume that OCLC hopes to be in that business itself.]

Download Formats & Course Packs

In the original settlement, the only download format mentioned was PDF. As we know, since then Google has announced that it will provide e-books from the publisher partner content that it carries on GBS. Ebook formats have been added in to the settlement as possible download formats. At the same time, the product line described as:
Custom Publishing - Per-page pricing of Books, or
portions thereof, for course materials, and other forms of custom
publishing for the educational and professional markets
has been removed.

Other?

There are complex changes to the treatment of orphan works which I have not tried (yet) to absorb. Those will undoubtedly have some impact on libraries and the public but at the moment I have no thoughts on that.

The settlement now allows rightsholders to place a Creative Commons license on their works. I really don't see a great deal of significance in this, although it does emphasize that by participating in GBS your rights are now governed by contract law rather than copyright law.

And, last, Google admits to some of its own difficulties in bibliographic control when it states that "The inclusion of a work within the Books Database does not, in and of itself, mean that the
work is a Book within the meaning of Section 1.19 (Book)." In other words: we threw a whole bunch of bib records into a database; don't assume anything from it.

Monday, November 09, 2009

Googled

Waiting for the next round of Google/AAP/AG settlement prose (which was due today, November 9, but has been moved back to Friday, November 13, when the parties will presumably present it to the judge), I have read Ken Auletta's book "Googled: the end of the world as we know it." It's mainly a business book, and primarily about media and advertising. I can sum up what it says about Google in three statements:
  1. Engineering can fix anything
  2. Information is neutral and measurable
  3. Advertising is information
OK, maybe that's a bit overly concise, but that is what it boils down to. I've often wondered how your motto can be "Don't be evil" when you are in the advertising business. It obviously works if you consider information to only have meaning based on numerical measures, and that advertising is just another kind of information. This engineer-based mentality as the guiding principle of the largest, richest advertising company in the world falls somewhere between Ayn Rand's objectivism and Bernie Maddoff's ponzi scheme. About 50% of Google's employees are engineers, and engineers, on average, earn twice what non-engineers earn.

Google has ramped up the advertising game by orders of magnitude, destabilizing huge, long-lived media companies, and it's all based on... winners win. Google sees its role as matching up users with things they are seeking, whether it's web sites, books, or a place to buy sneakers. It doesn't matter to Google what the information is.

There is something creepy about the way that Auletta refers to SergeyandLarry as "the founders." It sounds almost... cult-like. The fact that the book treats the founders and CEO Eric Schmidt as a three-some is just way too trinitarian for my taste.

Friday, October 23, 2009

Objecting to GBS/AAP/AG Settlement

ALA Washington Office has posted an analysis of who filed comments/briefs to the court relating to the Google/AAP/AG settlement. Of the "class member objectors," e.g. authors and publishers, 82 US parties filed objections. Astonishingly, there were 295 objections filed by foreign "class members," including the publisher organizations in a number of countries. The objections range from the seemingly trivial (the poor quality of the translations of the notice that were provided) to concrete descriptions of how the settlement violates the rights of rights holders under the Berne Convention. I'll sum up some of these objections:

  1. The class -- members of the class were not given sufficient notice, nor were they able to read the actual settlement documents, which were not provided in translation.
  2. Moral rights -- Berne includes moral rights, that is the right of the author to control the use of ones' work. This is interpreted quite liberally in some countries, to include things like cover images used in sales, metadata, etc. While these may seem unimportant, the Italian publishers' organization AIE was horrified to find one of its newsletters listed with an author of "Fascist Federation of Publishers". This was a previous name of the organization, but was found offensive to the organization.
  3. Registration requirements -- Berne states clearly that "... exercise of these rights shall not be subject to any formality..." It was this aspect of Berne that ended the copyright registration requirement in the US. Objectors claim that the need to register with the Books Rights Registry violates this aspect of Berne. The logic being that you are the copyright holder regardless of any action you take to assert that.
  4. Definition of "out of print" -- This is probably being revised by the main parties, but the original settlement document stated that "Google will use the publishing status, product availability and/or availability codes to determine whether or not the particular database being used considers that Book to be offered for sale new through one or more then-customary channels of trade in the United States." Various objectors were able to show that Google's determination (as available in the database managed by Google today) was wrong in a majority of cases.
  5. Definition of "in print" -- This one also might be undergoing revision. The settlement defines "in print" as "be offered for sale new." Some objectors pointed out that there are books that are free, that are online for open access, etc. The argument is that these cannot be considered out of print.
  6. Representation -- None of the foreign class members consider either the AAP nor the AG to represent them. Some ask that there at least be foreign class members on the board of the Rights Registry. Others simply consider the class membership to be invalid.
  7. The burden on publishers -- The burden of identification has been placed on publishers. For a publisher with an active list of titles, this could be a considerable amount of work. Google offered that if publishers would provide ONIX metadata, they would do an automated matching against the database. Apparently this has failed to provide relief, most likely because of differences between the publishers' metadata and that of Google.
  8. The effect of secrecy -- Because Google works heavily in "trade secret mode," it is very difficult for the rights holders to find and diagnose problems relating to their works. Yet the settlement does not hold Google accountable for errors in the data.
  9. Privacy -- the EU has rather strict privacy rules. This argument is a bit contorted because at the moment there is no plan to allow EU users to access the books covered by the settlement, since the settlement is only valid in the US. But at least one objector acknowledged that users would gain access by going through US proxy servers. It isn't clear to me if one can apply local law when masquerading as someone else through a proxy.
  10. Local digitization laws -- At least one country, Germany, has made provisions for library digitization of works (and in-library access) which requires that the library obtain permission from the rights holder. This objection is a bit indirect, but it seems to be one of indignation that Google could be digitizing works that the national library of the country where the work was published cannot.
  11. Censorship -- Many are concerned that Google may eliminate books from its service "for editorial reasons" without having to justify itself. This is an interesting and difficult argument -- it's like saying you're against the service, and you're afraid it won't have everything. It makes sense, however, because if Google becomes the predominant access to books in the US and it could censor without recourse, that a single company gets a great deal of control over both information and culture. There should be more objection to this from within the US.
  12. General moral and cultural indignation -- I read about a dozen of the foreign objections. In some cases, I may have been reading into the text an undertone of moral and cultural indignation. Not in the case of Germany and France, however, who were quite clear on their objection to the monetization of their cultural heritage. Here are some quotes:
"... the proposed settlement homogenizes (or "Googlizes") and demeans those special elements that distinguish the unique cultural tradition of France by turning books into a merely industrial by-product of a computer database."

"France's concern for its authors is only heightened by the proposed settlement's shroud of secrecy and hint of an uncontrolled, autocratic concentration of power in a single corporate entity, Google, that generates more revenue than many countries."
"The Federal Republic of Germany is historically called "Das Land der Dichter und Denker" (the land of poets and thinkers). ... Germany can rightfully claim the mantle of birthplace of modern printing and publishing. ... [the settlement] will flout German laws that have been established to protect German authors and publishers... creating a new worldwide copyright regime without any input from those who will be greatly impacted -- German authors, publishers and digital libraries and German citizens who seek to obtain access to digital publications through the Google service. "

Wednesday, October 14, 2009

OCLC and "Competition"

The announcement of a new company, SkyRiver, providing cataloging services to libraries has sparked a number of comments about competing with OCLC and WorldCat. For a number of reasons, I don't think that the result of such a service is necessarily competitive, although I am very glad to see alternatives enter the marketplace, especially for those who do not use OCLC.

To begin with, OCLC is more than an online cataloging service. Admittedly, revenue from cataloging is OCLC's largest income source, so cataloging is not in any way just an incidental function from OCLC's point of view, but cataloging alone is not the point or purpose of OCLC to its users. I see OCLC as a kind of social network where the "beings" are libraries. The value of OCLC is directly related to the population it encompasses, and the social services it can provide based on that population. Shared cataloging copy is one service, but discovery and delivery options probably motivate OCLC members as much or even more than the cataloging effort. This was evident when RLG still existed, as some RLG member libraries who did their cataloging in RLIN also loaded their records into WorldCat in order to participate in the services that OCLC provided.

The value of the catalog copy on OCLC may be second to the value of the holdings information that OCLC maintains. Catalog copy, if that's all you want, can be found in innumerable library catalogs (including the Library of Congress), and some library systems allow you to export or retrieve a full MARC record that you can then add to your own catalog. Catalog copy can also often be found on the retro of the title page in the form of Cataloging in Publication (CIP), although not in MARC format and not as a complete record. But no one else, and no other service, has the combined holdings of some 60,000 libraries, and that's the main thing that OCLC brings to the table. It is only because of these holdings that WorldCat has value to individual searchers and to the libraries who serve them.

The view of OCLC as "the only game in town" for library cataloging ignores the fact that there are libraries who do not participate in OCLC, for a variety of reasons, but who still need to create bibliographic records. These libraries may not be able to afford OCLC's prices for cataloging services, or they may simply not wish to be bound by the standards of that society of libraries. Some libraries, in particular those in corporate settings, are not able to share their holdings publicly, and therefore are not able to participate in the social life of libraries that WorldCat represents.

There are also non-library providers of library catalog records, in particular the vendors who include catalog data with the products they sell to libraries. These vendors need a source of cataloging copy that is unrelated to particular holdings information.

If we can think further down the line, a database of bibliographic records, like that in SkyRiver or biblios.net could become a resource for anyone who needs to work with bibliographic data. This could include anyone on a research project who wants to provide a quality bibliography with a minimum of effort. Although the bibliography will follow citation standards, the basic data is the same as that found in library records.

Another advantage that these and other bibliographic services may provide to us all in the library profession is that they could be a source of data for experimentation. What with RDA looming on the horizon and much talk about updating our data format from MARC to something else, we'll need data to work with. OCLC has historically been slow to change its data, and not without reason: OCLC is integrated into the workflows of tens of thousands of libraries that depend on it for every day functionality. Although the OCLC research division comes up with innovative ideas, the OCLC core functionality is essentially the same as it was two or three decades ago. If we want to experiment with radical change, I for one expect it to come from the sidelines, not the center.

Sunday, September 20, 2009

DOJ drops bomb in Google/AAP settlement

On Friday, September 17, 2009 the Department of Justice delivered its long-awaited Statement of Interest in the proposed settlement between Google and the AAP/AG in the class action suit surrounding the Google Book Search product. The DOJ has some very specific requirements for modification of the settlement, some of which could result in significant changes in the nature of the agreement. The headline, however, is:
that "the court should reject the settlement in its current form," and reconsider after changes are made.

Beyond that, my summary is this:

1) the DOJ does not like that the settlement allows uses of orphan works that go beyond those allowed by copyright law, and especially that others will be profiting from those uses

2) the DOJ considers the settlement to be anti-competitive, and

3) between the lines, it appears that the DOJ can't decide between supporting the full access to scanned books for the good of mankind, and wanting the settlement to limit itself to the original scope of Google's project, which was to digitize for indexing only.

And I should add:

4) nothing here has a direct effect on libraries or the Google library partners, except, perhaps, in that it changes the product that Google will provide as its subscription service, and

5) that the DOJ letter clearly states that Google and the AAP/AG are already in the process of making changes to the settlement to respond to the DOJ's concerns.

The Concerns

The Class


The first has to do with the definition of the class of rights holders who are party to the class action suit. DOJ concludes that the settlement does not satisfy the rules for defining a class as set out in Rule 23, the rule that governs class action suits.

In this area, DOJ is mainly concerned with the potential rights holders of orphan works. It isn't easy to understand what solutions DOJ sees for finding the rights holders for these works, but the Department is uneasy that known rights holders will be the ones negotiating with the rights registry, and that they will also benefit from any money made on orphan works. In other words, it will be to the advantage of rights holders that the parents of those orphans NOT be found. DOJ suggests, among other things, that the money made on orphan works not be paid out to others, but be used to try to find rights holders.

It also suggests that not enough work was done to notify all potential members of the class, in particular foreign authors.

The Potential Uses, and Orphan and Out-of-Print Works

DOJ appears to be nervous about the open-endedness of the future uses that Google can make of both orphan and out-of-print works. To remedy this, it is suggested that out-of-print works (including orphans) be treated the same as in-print works, that is, that rights holders must opt-in to any uses that Google intends to make of the works. To me this makes sense from a legal point of view, since copyright does not distinguish between in- and out-of-print status. It makes less sense from a market point of view, because presumably there is less active interest in the out-of-print works on the part of the rights holder. However, we really do not know what in- and out-of-print mean in a predominantly digital environment, and it may be a mistake to be making decisions based on the analog market, as the settlement does.

There are some parts of the DOJ document that suggest what could be radical solutions, yet they appear almost as asides, such as when suggesting that out-of-print works should be subject to opt-in, they say:
"Such a revision would, of course, not give Google immediate authorization to use all out-of-print works beyond the digitization and scanning which is the foundation of the plaintiffs' Complaint in this matter." p. 14
This seems to indicate that DOJ would be more comfortable with a settlement that essentially authorized the current scope of the Google Book Search product, which was the basis for Google's claim of Fair Use: search and snippet display.

In another section, they voice concern over the fact that some rights holders will be earning money on the unclaimed works of others. They say:
"The risk of such improper leveraging might also be reduced by narrowing the scope of the license. A settlement that simply authorized Google to engage in scanning and snippet displays in the future would limit the profits that others could potentially derive from out-of-print works whose owners fail to learn of their right to claim those profits." p. 15
In fact, this would greatly limit the profit that Google could earn (from which those of the rights holders derive), since the main source of expected profit for Google seems to be from the licensing of full views of the books (to libraries and other institutions) and the "sale" of books to individuals. If this is really what the DOJ means, then it is essentially suggesting that Google have no more use of orphaned works than it has today. With that limitation, it seems that Google might as well go forward with its Fair Use defense, if it would want to continue scanning books at all.

Competition


DOJ is concerned that the settlement doesn't allow for sufficient competition. It isn't clear to me, however, how that competition might be achieved. First the document states that the Registry does not have the power to give access to works to entities other than Google, since copyright law doesn't allow it. Then it says that the best solution is to make sure that other companies get equal access. To show that I'm not making this up (although I may be mis-interpreting):
"The Proposed Settlement does not forbid the Registry from licensing these works to others. But the Registry can only act "to the extent permitted by law." S.A. 6.2(b). And the parties have represented to the United States that they believe the Registry would lack the power and ability to license copyrighted books without the consent of the copyright owner -- which consent cannot be obtained from the owners of orphan works." p. 23
"This risk of market foreclosure would be substantially ameliorated if the Proposed Settlement could be amended to provide some mechanism by which Google's competitors could gain comparable access to orphan works...." p. 25
As far as antitrust goes, the document states that although there are concerns about antitrust, the full analysis has not been completed. There are suggestions, however, that the main concerns have to do with the Book Rights Registry and the setting of prices for all works (instead of relying on competition to determine prices).

-------------

All in all, it seems to me that the DOJ has pointed out some of the same problems indicated by others, but unfortunately hasn't really given a clear direction for the settlement to take. What we do know is that we'll see a new version of the settlement sometime in the future... many more pages of dense text to ponder.

Monday, September 14, 2009

Google Books Metadata and Library Functions

In a recent post in the NGC4LIB list, we got a very welcome answer from Chip Nilges of OCLC about Google's use of WorldCat records:
To answer Karen's most recent post, Google can use any WC metadata field. And it's important to note as well that our agreement with Google is not exclusive. We're happy to work with others in the same way. The goal, as I said in my original post, is to support the efforts of our members to bring their collections online, make them discoverable, and drive traffic to library services.

Regards,

Chip

As we have seen from recent postings about the metadata being presented in the Google Books Search service, there are some problems. Although Google claims to have taken the metadata from its library partners, we can look at records in GBS and the record for that item in the library partner database and see how very different they are. It is clear that Google has not retained all of the fields that libraries have provided, and has made some very odd choices about what to keep. Perhaps what we need to do, to help Google improve the metadata, is to make clear what data elements we anticipate we will need in order to integrate the Google product with library services.

When you ask people what metadata is needed for a service, they will often reply something like "everything" or "more is better." I'm going to take a different approach here because I think it is a good idea to connect metadata needs with actual functionality. This not only justifies the metadata, but the functionality helps explain the nature of the metadata that is required. For example, if we say that we want "date of publication" in our metadata, it may seem that we could use the date from the publication statement, which can have dates like "c1956" or "[1924]." If, instead, we indicate that we want to use dates in computational research, then it is clear (hopefully) that we need the fixed field date (from the 008 field in the MARC record).

So here are the functions that come to my mind, and I welcome additions. (Do remember that at this point we are only talking about books, so many fields relating to other formats will not be included.) I'll add the related MARC fields as I get a chance.

Function: Scholarship
Need: A thorough description of the edition in question. This will include authors, titles, physical description, and series information.


Function: Metasearch
Need: To be able to combine searches with the same data elements in library catalogs. Generally this means "headings," from the bibliographic record (authors, titles, subject headings).


Function: Collection development
Need: To use GBS to fill in gaps (or make comparisons) in a library's holdings, usually using classification numbers.


Function: Linking to other bibliographic collections or databases
Need: Identifiers and headings that may be found in other collections that would allow linking.

Function: Computation
Need: Data elements that can mark a text in time and space (date and place of publication), as well as those that can help segment the file, like language. This function also may need to rely on combining editions into groupings of Works, since this research may need to distinguish Works from Manifestations. Computation will most likely use metadata as a controlled vocabulary, and the full text of the work as the "meat" of the research.

Tuesday, September 08, 2009

GBS, according to Amazon

When I first read the settlement agreement between Google, the AAP and the Author's Guild, I immediately thought: "Wow. Jeff Bezos must be freaking out!" Because it is obvious that the settlement, as written, sets up a bookselling operation of unprecedented proportions. It also does so in a way that makes it hard if not impossible for any other company to compete in certain areas, particularly in relation to works that are out of print but not out of copyright.

Amazon has responded to the proposed settlement with a document for the court. (The document for Amazon was authored by David Nimmer, known for "Nimmer on Copyright", the primary text on the topic of US copyright -- and which sells for over $2,000. When it comes to "big guns" it's hard to get any bigger.) The document makes four major points relating to the settlement. I will paraphrase them here, but if you have an interest in what Amazon has to say you must read the document yourself, because my analysis undoubtedly reflects my non-expert reading of it.

  1. The settlement should be rejected because it makes changes to copyright law that should be decided by Congress, not a lawsuit.

  2. The settlement should be rejected because the Book Rights Registry that it creates is a cartel of rights holders, and violates anti-trust law.

  3. The settlement must be rejected because its expropriation of orphan works violates the copyright act.

  4. The settlement must be rejected because it would release Google from liability of future actions.

All of these seem like good arguments to me, but I am especially taken by the fourth one. The Amazon document explains in some detail that class action here is being used to allow future actions that are not part of the complaint.
"A class action settlement can only extinguish claims that arise from the same factual predicate as the class claims.... Future claims for future conduct cannot be released by a settlement agreement because they are not part of the same factual predicate as the purported claims." p. 35
What this says, in my interpretation, is that Google is being taken to court by the AAP and AG because it has, in the past, scanned and OCR'd books that are in copyright without asking permission of the rights holders. Yet, the settlement addresses actions that Google has not yet taken, such as the sale of institutional subscriptions, consumer sales of access to books, and a variety of possible revenue models such as print on demand. This is not redress for violation of rights but a kind of blanket agreement that gives Google rights over the materials for future developments.
"The sale of books or subscriptions to a database of scanned works is conduct in which Google has not yet engaged and, because of criminal sanctions, likely would never engage without a clear license to do so." p. 39
Nimmer's analysis seems to be that this is not appropriate in a lawsuit, and especially one in which members of the class are giving up future rights that cannot even be enumerated. The hypothetical example reads:
"... let us imagine that Google has already scanned Lonesome Dove and included it in the Google Books Program, that Technology X is invented in 2016, and that Google decides in 2020 to inaugurate widescale expoitation of books via that new technology including Lonesome Dove. To the extent that author Larry McMurtry objects to that exploitation in 2021 (in the same way that previous litigation contested the scope of his grant of books rights to his publisher in Lonesome Dove at the dawn of the age of audio books), a dispute may develop between author and publisher. The Settlement Agreement goes out of its way to immunize Google from any liability for copyright infringement under those circumstances." p. 39 footnote 29
I cannot confirm nor dispute this analysis, but there is something very frightening about giving up (or assigning, depending on how you see it) rights for an indefinite future when we have no idea what that future will bring. The Amazon comments have interpreted the settlement as having overly expansive concessions to Google that could have unintended consequences in the future.