Thursday, March 24, 2011

Open Data II

In this post I want to talk about some of the Open Government Data (OGD) projects taking place around the world.

Open government data is assumed to be a given by many in the US because our copyright law states that federal government data is not covered by copyright. (The situation in US states can vary, but the federal government's declaration sets the tone.) In other countries the situation is less clear and governments do not have a mandate to make data open. However, the open government data movement has purred on a number of fast-moving activities, many sponsored by governments themselves that encourage citizens to download and use government data.

The UK government has a site, Opening up government, where it not only shares data but encourages people to develop apps that use the data. Apps here can alert you to new building and planning projects in your area, and give you real-time public transportation information.

The EU has its own Open Government Data Initiative. It provides the data under these terms of use:
All Data on is available under a worldwide, royalty-free, non-exclusive license to use, modify, and distribute the datasets in all current and future media and formats for any lawful purpose and that this license does not give you a copyright or other proprietary interest in the datasets.
There is a European site for public sector information, the European Public Section Information Platform: Europe's One-Stop Shop on Public Sector Information Re-use. You can search by country and see news and developments relating to public data, much of which is available for re-use. Because many countries for not have an explicit statement in their copyright laws covering government data, one of the important early steps for these jurisdictions is to develop blanket licenses that they can apply to the data. So when you visit the site you see recent news that Norway has developed a license for its government data and is asking for feedback (if you read Norwegian).

To understand the force of this movement, it is said that Albania and Bulgaria are on the verge of opening some government data.

The Obama administration announced its Open Government effort on the first day of his administration.
To the extent practicable and subject to valid restrictions, agencies should publish information online in an open format that can be retrieved, downloaded, indexed, and searched by commonly used web search applications. An open format is one that is platform independent, machine readable, and made available to the public without restrictions that would impede the re-use of that information.
Wired has a US-oriented "how-to" wiki on OGD. (Of course, they include in their "how-to" examples, being Wired, but it's a good example of the range of utility of OGD. )

Not all data is at the country level, of course, and the movement is reaching into lower levels of government. Paris has an open data portal, while Enschede Netherlands has an open data declaration for its information. In Italy, the government of the Piemonte Region has a website for its open data.

The government open data movement is heavily influenced by grassroots efforts to convince governments that open data is a good thing -- not just for government watchdogs and opposition movements, but for heathy government and strong business. In the UK there is a Working Group on Open Government Data of the Open Knowledge Foundation, an independent not-for-profit that is promoting, as its name says, open knowledge. In Italy there is the wonderfully named "Spaghetti Open Data." Spain has a broad coalition of non-profits that form the "CoaliciĆ³n Pro Acceso." The CKAN web site, which is a general archive of available datasets of all kinds, has OGD under a number of tags, such as "gov". [Just out: Open Government Data video.]

We hear a lot about problems with copyright, with DRM, with information providers who want to lock down their products. Government data covers a huge variety of information types and is often the key information needed for a lot of civic and scientific decision-making. OGD can generate a mountain of new knowledge, and then tell you how high the mountain is.

Tuesday, March 22, 2011

Judge Chin rejects AAP/Google settlement

I'll say more when I've read it, but I put a copy on the Internet Archive.

After reading:

The judge's decision holds no real surprises. His analysis is fully consistent with the reactions of the interested parties to the case. He rejects the settlement primarily on these grounds:
  • It seems that a significant segment of the class of authors/publishers is not happy with the settlement. "Some 6800 class members opted out." (p.10) Also, a majority of the comments on the proposed settlement were negative, many coming from non-US copyright holders who did not identify with the class.
  • The settlement would make significant alterations to the current copyright regime, which should be a matter for Congress rather than the court.
  • The settlement's conclusion would go beyond the original lawsuit, which was over the digitization of in-copyright works by Google and the presentation of snippets relating to searches. The settlement would allow sales of full text works, which was never an issue at the time of the original lawsuit.
Although he rejects the settlement on numerous grounds, the judge concludes by saying "...many of the concerns raised in the objections would be ameliorated if the ASA were converted from an "out-out" settlement to an "opt-in" settlement." (p. 46) This leaves the door open for yet another settlement attempt between the parties.

It is important to note that the position of digitization and ebooks today are vastly different than they were in 2005 when the authors and publishers first sued Google over its library digitization project. It is possible that if the question of Google's digitizing were to be put forth for the first time today, the actions of the parties and the results would be vastly different. This is clearly a case where technology has moved forward at a rapid pace while the courts were contemplating an agreement that was standing still.

What now?

It's hard to believe that Google and the AAP/AG have not prepared themselves for this possibility. Yet, certain activities have gone forward as if the settlement were already approved.
  • A form of the Book Rights Registry is in place in the sense that there is a database of digitized works and a way to claim them to receive the proposed one-time payment. Presumably that payment is now not going to happen, but meanwhile Google has a large database with copyright holder information (including contact info, if I remember the form correctly).
  • The BRR has a chosen Director (Michael Healey).
  • It isn't clear if Google has continued digitizing books that are under copyright without specific permission. To be sure they have made many deals with publishers and with libraries to digitize works since the 2008 date when the settlement was first proposed, and digitization has gone forward.
  • Some libraries that had partnered with Google prior to the lawsuit have negotiated new contracts that are compatible with some of the conditions contained in the settlement. I don't know if these contracts have been signed or have been awaiting the result of the lawsuit but I do recall that the libraries obtained less rights in relation to retaining copies of their digitized books in the new contracts than they did in the old. The upshot being it isn't clear where this leaves the partner libraries, nor organizations like HathiTrust who are involved in the storage and possible uses of the Google digitized books.
  • For libraries and institutions that were looking forward to subscription access to the books, this access is now a big question mark. It was dependent on conditions in the settlement.
There are undoubtedly many other issues that are now open questions. When the settlement was first announced I began a "question list." It might be a good idea to revive that given this new perspective. And for those wondering "what now?" (that is, all of us) there's a flow chart.

Friday, March 04, 2011

Open Data I

The idea of open data has gone from an extremist rallying cry to a mainstream movement. In the next few posts I'll highlight just an iceberg tip's worth here, but expect to see more about this every day that passes.

The UK's educational research arm, JISC (something like NSF but more for education rather than pure science) and the research libraries' organization, RLUK, undertook a study about the advantages and possibilities afforded by opening data from libraries, archives and museums. They have produced the Open Bibliographic Data Guide, which investigates the business case for providing bibliographic data that can be re-used.

This is a practical, not a utopian, vision of open data.
"In earlier times, observers may have considered the ‘open data movement’ as the preserve of a certain type of fanaticism also associated with Open Source Software (OSS) and Open Content, emotionally and ideologically linked to the spirit of 1969.

However, OSS and Open Content have now morphed in to propositions with clear business cases of interest to corporations, institutions and governments. National strategies and Chief Information Officers espouse Open Source Software for financial and business benefit, whilst academic leaders are supporting Open Access Journals and Open Educational Resources (OER)."(link)
The report gives 17 different use cases -- situations in which an institution might want to provide its data with some degree of openness.

1 – Publish data for unspecified use
2 – Publish open linked data for unspecified use
3 – Supply data for Physical Union
4 – Allow Physical Union Catalogue to publish data
5 – Expose data for federation into Virtual Union Catalogue
6 – Publish grey literature data
7 – Contribute data to Google Scholar
8 – Publish activity data
9 – Supply holdings data for Collection
10 – Expose holdings / availability data for Closest Copy location
11 – Share data for Collaborative Cataloguing
12 – Supply data for Crowd Sourced Cataloguing
13 – Supply data to be enhanced for own
14 – Publish data for LIS research
15 – Allow personal use of data for Reference Management
16 – Publish data for lightweight application development
17 – Allow commercial use of data in mobile application

For each of the cases the report discusses pros and cons for the institution, its users, and the world, as well as the business case for making ones' data open. They acknowledge the complexity of our current environment of bibliographic data ownership:
"Our problems with bibliographic metadata are quite specific:
  • Non-profit and commercial players have built businesses around datasets of MARC records, indexing / TOC services and journal Knowledge Bases – but what is original about those accumulations?
  • Bibliographic records in the circulation amongst libraries are of uncertain and complex provenance, with the exceptions of those explicitly tagged by a ‘vendor’ or exclusive to a special collection" (link)
JISC doesn't stop at this report but is sponsoring projects and ongoing activities in this area. Already the British Library has produced its British National Bibliography data openly for reuse. You can keep up with these activities through the newsletter (subscribe here) whose logo reads: One to many; Many to one: Towards a virtuous flow of library, archival and museum data.