Tuesday, February 19, 2008

Libraries -- Open

I think of libraries as the quintessential open institutions. At least in the U.S. Most libraries are physically open to the public (even those in private institutions), and many serve as community spaces focused on learning, exploring, and simply being. They also promote the open use of what we might call "cultural heritage resources." Libraries fight for open access and they fight censorship all of the way to the Supreme Court.

Yet - there seems to be a real barrier when it comes to libraries being open with their own catalog data. This seems rather odd because most libraries' catalogs are available for open access on the web. But try asking a library for some or all of that data and you suddenly hit a wall. Libraries don't like to say "no" so there's a lot of hemming, hawing, "we'll think about that," that goes on. But an out-and-out "yes" is rare.

I speak about this based on my experience with the Open Library, a project of the Internet Archive. The OL wants to create a humongous bibliographic database (right now only of book records) on the Web, using a wiki-like front-end that would allow anyone to edit the bibliographic data. To me the most interesting aspect of the project is that it would bring bibliographic entries to the web's surface; they could be the subject of links from other documents, and potentially could begin the creation of a bibliographic web linking books to each other. But in spite of putting out a call for bibliographic records and making personal and direct pleas to a number of libraries, the OL has received only a lukewarm response.

To be sure, any data submitted to the Internet Archive becomes publicly available. And at some time in the future it may be possible for people to download individual bibliographic records for their own use. I know that there is some speculation that OCLC "owns" the data and that the OCLC license may not allow this level of re-distribution. I also know that some records in library databases are covered by vendor licenses (other than OCLC). Presumably those could be excluded from the data set. But it still surprises me how un-open libraries are with their own data, given how much they encourage others to be open with theirs.

During the comment period for the Future of Bibliographic Control report, the Open Knowledge Foundation posted a call for library data openness on the OKF wiki. Many dozens signed their names to the OKF's call for open licensing of bibliographic data, including important people like Larry Lessig and Tim O'Reilly. The arguments in OKF's document seemed pretty clear to me:

Bibliographic records are a key part of our shared cultural heritage. They too should therefore be made available to the public for access and re-use without restriction. Not only will this allow libraries to share records more efficiently and improve quality more rapidly through better, easier feedback, but will also make possible more advanced online sites for book-lovers, easier analysis by social scientists, interesting visualizations and summary statistics by journalists and others, as well as many other possibilities we cannot predict in advance.
Nothing of this was included in the final report.

Libraries complain that they don't get the kind of attention that Web resources like Google and OL get. They complain about the lack of transparency of the commercial data vendors; that Google won't say how many books it has online nor will it reveal its work on attempts to rank book retrievals. Libraries could be doing this experimentation themselves, and in the open, if their data were on the Web. They could be visible, out there, allowing incredible innovation to happen based on the hundreds of years of collecting materials and creating relatively consistent metadata for those materials. Their reluctance to let their data out of the databases just baffles me, and isn't in concert with their stated goals of open access.


Kelly said...

A few times we have also been approached to share our data, a request I passed on to our cataloging department. As we do a great deal of record customization, the cataloging department was primarily concerned about the quality of the records. They didn't want to put potentially non-perfect records 'out there.' (And yes our catalog is online). I believe it's the added exposure of errors that make them pull back rather than an unwillingness to share.

Solution? None so far here, but your post may be a start.

arkham said...

I'm completely in agreement on open access to bibliographic data. In my own library system (I work for a regional library system), the way our cataloging is paid for makes sharing with anyone outside our consortium difficult. For instance, we recently had a library look seriously into joining our consortium. This would have cost them a significant amount of money, as their membership fees go a long way towards paying a portion of the annual cost of our ILS plus system staff salaries, etc. When they decided to pull out and not join, they requested access to our bibliographic records. Because of the financial situation (they'd be getting the benefit of professional consortial cataloging without paying for it) we refused.

Unfortunately, I suspect that situations like this are not uncommon, and is one of the reasons current library models don't foster open sharing.

Anonymous said...

We at the Montana State Library do want to be a part of the Open Library and want to provide as many possible points of access to our unique collection of Montana State publications.

To this end I submitted nearly 12,000 bib records to Aaron Schwartz back in October but received no response. I'll try again.

Jennie Stapp
Montana State Library

John Mark Ockerbloom said...

I don't know if this is the sort of data you need (it's not MARC, but it does have LC subject headings, and many of its names are authority-controlled) but Dublin Core records of all 30,000+ listings of The Online Books Page are free for folks at the Open Library or elsewhere to harvest and reuse if you find them useful.

The records are exported via OAI-PMH and licensed via Creative Commons Attribution-ShareAlike.

While this is a much smaller catalog than that of the typical research library, it may be of particular interest because all the books listed can be read freely online.

For more details, see


Karen Coyle said...

Thanks, John, I'll pass that along to the OL folks. Non-MARC is fine. And for anyone else looking at this, the last part of John's URL is "#data" - this template cuts things off (aaarrgh!)

Anonymous said...

This may sound a little old and stodgy, but what I (as a cataloger) have issue with is the "allow anyone to edit the bibliographic data" part. Granted, I am not very familiar with wikis or Open Library, but the thought of random people changing the contents of a bibliographic record makes me pretty nervous. I agree that user comments, reviews, and subject headings would be fabulous, but would draw the line at letting the inexperienced user edit the record itself.

Karen Coyle said...

I understand the discomfort of allowing "anyone" to edit the bib data. However, this is a wiki environment, where all edits are given versions (meaning each version is saved, and you can choose to use a different version) and all edits are attributed (meaning you know who contributed them). In this sense, there is no one "record" but a number of versions. While this might seem like a vast confusion, in fact it's easily manipulated by programs. There's been some interesting speculation about treating catalog records as a social activity, where you can determine who your cataloging "friends" are and accept their records and edits, while ignoring others. I'll try to blog about this in the near future, but essentially there are applications and interfaces that make the "everyone can edit" much different to allowing users to edit the library's MARC record.