Wednesday, April 07, 2010

After MARC

The report on the Future of Bibliographic Control made it clear that the members of that committee felt that it was time to move beyond MARC:
"The existing Z39.2/MARC “stack” is not an appropriate starting place for a new bibliographic data carrier because of the limitations placed upon it by the formats of the past." p. 24

The recent report from the RLG/OCLC group Implications of MARC Tag Usage on Library Metadata Practices comes to a similar conclusion:
"5. MARC itself is arguably too ambiguous and insufficiently structured to facilitate machine processing and manipulation." p.27

We seem to be reaching a point of consensus in our profession that it is time to move beyond MARC. When faced with that possibility, many librarians will wonder if we have the technical chops to make this transition. I don't have that worry; I am confident that we do. What worries me, however, is the complete lack of leadership for this essential endeavor.

Where could/should this leadership come from? Library of Congress, the maintenance agency for the current format, and OCLC, the major provider of records to libraries, both have a very strong interest in not facilitating (and perhaps even in preventing) a disruptive change. So far, neither has shown any interest in letting go of MARC. The American Library Association has just invested a large sum of money in the development of a new cataloging code. It has neither the funds nor the technical expertise to take the logical next step and help create the carrier for that data. Yet, a code without a carrier is virtually useless in today's computer-driven networked world. NISO, the official standards body for everything "information" is in the same situation as ALA: it cannot fund a large effort, and it has no technical staff to guide such a project.

It seems ironic that there have been projects funded recently to develop library-related software based on MARC even though we consider this format to be overdue for replacement. The one effort I'm aware of to obtain funding for the development of a new carrier was rejected on the grounds that it wasn't technically interesting. In fact, the technology of such an effort isn't all that interesting; the effort requires the creation of a social structure that will nurture and maintain our shared data standard (or standards, as the case may be). It requires an ongoing commitment, broad participation, and stability. Above all, however, it requires vision and leadership. Those are the qualities that are hard to come by.

11 comments:

Jerome McDonough said...

Out of curiosity, is MODS any closer to your ideal state than MARC? It is at least somewhat more amenable to machine processing and the XML base makes it more compatible with contemporary computing environments.

Karen Coyle said...

Jerry - actually, no. Although MODS improves on some of the problems found in MARC, I don't see a real philosophical shift from the basic concepts in MARC. It's "a better MARC XML" but not a web-friendly data format. I'm leaning away from the hierarchical view that XML favors to something more E-R like. I think the latter makes sharing and mash-ups more feasible.

Ryan Shaw said...

Need such an effort await funding? Atom, HTML5, PubSubHubbub: none of these standards waited for funding to get started. Of course, now companies pay employees to work on them. But the leaders of those efforts didn't sit around and wait for that to happen before getting started.

MLB said...

Karen, Why not an IMLS National Leadership Grant?

From the grant Web page:

"National Leadership Grants support projects that have the potential to elevate museum and library practice. The Institute seeks to advance the ability of museums and libraries to preserve culture, heritage and knowledge while enhancing learning. IMLS welcomes proposals that promote the skills necessary to devlop 21st century communities, citizens, and workers."

As you say, it is not interesting technically, but it is interesting institutionally and interesting for its potential social impact. The new format is a key piece of a new set of relationships that could bring library services into the 21st century.

Matthew Beacom

Karen Coyle said...

Ryan, a fair amount of work has already been done "pro bono." You can look at the DC/RDA wiki, which has some case studies with attempts at code: http://dublincore.org/dcmirdataskgroup/. Take a look at the cataloger scenarios listed here, and click on the links by each scenario for "turtle" versions. There's also the Metadata Registry which has registry entries for RDA and FRBR elements and vocabularies.
http://metadataregistry.org/rdabrowse.htm
There are efforts like http://lcsubjects.org/ that aren't funded projects.

So it's not like folks aren't working on it, it's just that it needs to reach a point that there are a number of people interested in making it happen. We may be reaching that critical mass, but even if some development begins, there is the big question of management and maintenance, since libraries and library vendors will want a "certified" standard.

Diane said...

Karen, I'm in 100% agreement. I've been terribly frustrated at what seems to be a "not invented here" response of LC and OCLC to the work done by the DCMI/RDA Task Group (of which I'm co-chair). What we don't need is 5-10 more years of political machinations, with no real support--moral or financial--to make this happen. Good work and innovation has been done by many outside the magic circle, but everyone waits for these two behemoths to get off the dime. This is not leadership, and the library community deserves better.

Ryan Shaw said...

...libraries and library vendors will want a "certified" standard.

I guess I'm skeptical of the model where people work hard on developing a new data standard, getting it to the point where their is sufficient consensus that it can be "certified" in some way, and then handing it over to libraries and vendors to be implemented and used. That model has failed repeatedly on the the web. Successful new web standards, on the other hand, have started with implementations and real-world use, and been refined through that use. I like Stefano's way of putting it:

...standards (and their maintenance efforts) should be judged for their ability to catalyze stable polishing activities around the contact surface rather than for the qualities of the surface they were meant to describe. A good polishing process with a rough surface will always end up more polished that a well polished surface with a poor polishing process around it.

I think the "good polishing process" is basically what you meant by "social structure that will nurture and maintain our shared data standard", so I guess we're in agreement. I just feel like the process needs to start with hacking, not funding… if OpenLibrary, LibraryThing and maybe one other party with shared interests were to start a public process like the Atom or HTML5 ones, and got a bunch of book nerds involved in the discussion, I think they could get pretty far. Then, a few years later, with a working standard in actual use to point to, maybe it would be time to look for some funding to help get the more conservative institutions to start adopting it…

Hugh Taylor said...

[Declaration of interest: I was one of the co-authors of the OCLC Research/RLG Partnership referred to in Karen's original post]

I'm not sure I entirely agree that OCLC has "a very strong interest in not facilitating ... a disruptive change". Of course, it depends on your definition of "disruptive", perhaps - or the scale of the disruption, even.

One of OCLC's primary interests is in building a large-scale, cooperative database, and in developing services that utilise that database and/or its contents. I'm pretty sure that the data is already not stored in MARC per se (which is pretty common, in any case). Re-engineering a database so that it's encoded in X rather than Y doesn't seem hostile to OCLC's interests, at least so long as OCLC still has a database and can still deliver services based on that database. But such a process would have to pass a business case study, for sure. And that's not to suggest that OCLC would be the place we should look for leadership on the underlying issue (and doubtless there are some who wouldn't want OCLC to take the lead...).

And as an aside, I wouldn't want folk to read into Karen's comment the idea that RDA development was funded solely by ALA. That's not what Karen said, but it's how some might (choose to) read it.

Susan said...

I'm so in love with the idea of mashable (and open) bibliographic data and "crosswalks"/mappings that allow support in multiple formats that I can hardly stand it right now, and yet it seems so far away.

hharper said...

I am concerned that elaborate crosswalks will sometimes shepherd unusable free text strings from one system to another, gaining little. Hurray for crosswalks, but dragging a garbage can across the street doesn't change the garbage to roses.

Karen Coyle said...

As per "crosswalks" the big problem that I see is that the nature of crosswalks is that we try to shoehorn data elements from other record formats into data fields that they just don't fit into. Taking a dc:title field and putting it into a MARC 245 $a is just asking for trouble. As long as our systems work with *records* rather than data elements, we will continue to create these messes. I want a system that can store either (or both) a dc:title or a "title proper" from a library catalog record, and not pretend that one can become the other and still make sense.