Thursday, August 09, 2007

Wish list: ONIX records

There are things that I wish existed, but don't, so I'm going to start posting my wishlist here, one piece at a time. Some of these things might not be possible for various reasons, and some may already exist but I'm just not aware of them (but I hope you'll clue me in). For those that could be done, let's talk about how we could make them happen.

The first one that I'm posting is a desire for a database of available ONIX records.

A few years ago I looked at some ONIX records that were being created for e-books and I have to say that they were so poor as to be almost unusable. Recently I've been reviewing some ONIX records received at the Internet Archive for the OpenLibrary project. There are only about a half dozen publishers represented there, but it's obvious to me that they are producing useful data. The basic bibliographic data is there, plus there is data that fits into the "book promotion" realm: blurbs, author bios, subject categories. This is data that is sent to online booksellers and to bookstores. It would be useful for libraries and for anyone else keeping data about books. But I don't know of anyone who is aggregating it, much less making it public.

What we need is:
  • a database that receives ONIX feeds
  • that keeps the records up to date
  • that has a z39.50 capability and an API for retrieving data
  • that can output in a couple of different common formats
It seems that this could be a great companion to CoverThing, a project proposed by LibraryThing creator Tim Spaulding (and perhaps in the works?) In any case, it's like there's a bunch of bibliographic data that is being created and then flushed down the drain. Let's find a way to save it and use it. (And I sure hope the publishers feel this way, too.)


arkham said...

There seems to be a big push for using vendor-created metadata recently. While I can see the value of limited publisher metadata, such as title, author, pagination, basic descriptive metadata, I hope that this won't become a push to have most cataloging/metadata creation done by publishers. Being commercial enterprises, one can hardly expect them to provide objective subjects and summaries - particularly if items are on controversial topics. If using publisher basic descriptive metadata is supplemented by some form of not-for-profit metadata creation, which for now should call for a trained cataloger, that's fine - I just don't think publishers should be expected to provide the kind of cataloging metadata that libraries need.

Ian Singer said...

By quick way of introduction, Ian Singer adding here - I'm VP, Data Services of RR Bowker.

If I'm reading this thread correctly, I believe Bowker's BooksInPrint database is a source that answers the issues addressed:

1) We have a wide staff of data analysts who are engaged in maintaining our global database of over 16M records . . .and we can extract all of our data into ONIX 2.1 fields/feeds;

2) We have subject classifications that map to BISAC and BIC numbering over 80,000 - that's granularity, no?

3) We have web services that enable users to embed data into workflow vs. using an interface.

4) In regard to ebooks - we have hundreds of thousands of titles, and our db is growing; however, as publishers work through price issues related to ebooks vs. physical content, rich data provisioning may continue to be an issue.

I don't want to sound like an advertisement, but we're here and anyway I, or Bowker, can help libraries/librarians, we want to and will.

Diane said...

Maybe we don't need a database, but just somebody to convince the publishers to use OAI-PMH to make their data available? Then those who want, can get ...

Who will bell the cat?

Diane Hillmann

Karen Coyle said...

Thanks for your comments.

Arkham, the ONIX data definitely has a promotional slant, but there are times when libraries, too, want to promote. After all, part of our role is to promote reading. The ONIX records have those same blurbs that we find on the backs of books and sometimes use to decide what book we'll enjoy. I see these as an addition to our services, much like many libraries are showing book covers in their catalogs.

Ian, I probably should have included the words "open" and "free" in my description of my wish. Many libraries are using Bowker data to enhance their catalogs. I'm just putting out a wish for those who can't afford a commercial service.

Diane, doesn't OAI/PMH imply that the data is in a database? I believe that publishers have the data available only as datasets (for FTP). And I suspect that they're fairly happy to share their data. What we need is for someone to create a central location for the data (so you don't have to know what publisher site to go to) and to build services around it. My wish is also for those services to be openly available on the net. It might be do-able as a collaborative project. I don't know how it could be funded, but I would really like for the poorer libraries, those that can't even afford to be part of OCLC, to be able to make use of it. In a sense, Amazon does this but their agreement requires you to link back to Amazon when you display their data. (so if users click on the book cover, they end up at Amazon) I think there are times when that just isn't appropriate, and in library catalogs might just confuse the user.

Renee Register said...

Renee Register here, posting as a member of the ALCTS working group on automating metadata creation, as a long-time cataloger and manager of catalogers and as a product manager with OCLC Cataloging Partnering. I believe it's crucial to the future of cataloging to find centralized ways to take advantage of publisher ONIX metadata and I also believe we must find efficient and centralized ways to store, enhance and normalize the metadata for the benefit of both library and publishing communities.

I believe librarians can and should participate in raising the quality of metadata in the marketplace in which we (and end-users) participate whenever we select and purchase materials.

In the working group (headed by Michael Norman from the University of Illinois at Urbana-Champaign), we found that many libraries are either considering or actively working on development to allow ingest and manipulation of ONIX data as part of the cataloging process.

So, we find ourselves again in the position of creating local solutions and duplicating effort across institutions to handle metadata for the same set of titles published each year. Finding ways to centralize this process would certainly result in cost savings and workflow efficiencies for libraries and result in greater upstream availability of metadata for use in library technical processes and end-user interfaces.

There is also a tremendous duplication of effort and expense in the creation and enhancement of metadata across the publisher supply chain. Cooperative work in this area will only enhance the metadata we all use.

Stay tuned for the working group's completed white paper toward the end of this month and, also this fall, for an OCLC pilot program with publishers and libraries to test some of the group's findings and (I hope) validate some the assumptions stated above.

Alain Pierrot said...

Renee Register proposal seems a very reasonable proposition, much easier to promote than Peter Brantley's which is provocative (as far as a French native speaker can interpret it) but not so different about the need to provide good metadata, based upon new techniques and practices.