Friday, September 09, 2011

MARC vs RDA

As LC ponders the task of moving to a bibliographic framework, I can't help but worry about how much the past is going to impinge on our future. It seems to me that we have two potentially incompatible needs at the moment: the first is to fix MARC, and the second is to create a carrier for RDA.

Fixing MARC

For well over a decade some of us have been suggesting that we need a new carrier for the data that is currently stored in the MARC format. The record we work with today is full of kludges brought on by limitations in the data format itself. To give a few examples:
  • 041 Language Codes - We have a single language code in the 008 and a number of other language codes (e.g. for original language of an abstract) in 041. The language code in the 008 is not "typed" so it must be repeated in the 041 which has separate subfields for different language codes. However, 041 is only included when more than one language code is needed. This means that there are always two places one must look to find language codes.
  • 006 Fixed-Length Data Elements, Additional Material Characteristics - The sole reason for the existence of the 006 is that the 008 is not repeatable. The fixed-length data elements in the 006 are repeats of format-specific elements in the 008 so that information about multi-format items can be encoded.
  • 773 Host Item Entry - All of the fields for related resources (76X-78X) have the impossible task of encoding an entire bibliographic description in a single field. Because there are only 26 possible subfields (a-z) available for the bibliographic data, data elements in these fields are not coded the same as they are in other parts of the record. For example, in the 773 the entire main entry is entered in a single subfield ("$aDesio, Ardito, 1897-") as opposed to the way it is coded in any X00 field ("$aDesio, Ardito,$d1897-").
Had we "fixed" MARC ten years ago, there might be less urgency today to move to a new carrier. As it is, data elements that were added so that the RDA testing could take place have made the format look more and more like a Rube Goldberg contraption. The MARC record is on life support, kept alive only through the efforts of the poor folks who have to code into this illogical format.

A Carrier for RDA

The precipitating reason for LC's bibliographic framework project is RDA. One of the clearest results of the RDA tests that were conducted in 2010 was that MARC is not a suitable carrier for RDA. If we are to catalog using the new code, we must have a new carrier. I see two main areas where RDA differs "record-wise" from the cataloging codes that informed the MARC record:
  • RDA implements the FRBR entities
  • RDA allows the use of identifiers for entities and terms
Although many are not aware of it, there already is a solid foundation for an RDA carrier in the registered elements and vocabularies in the Open Metadata Registry. Not long ago I was able to show that one could use those elements and vocabularies to create an RDA record. A full implementation of RDA will probably require some expansion of the data elements of RDA because the current list that one finds in the RDA Toolkit was not intended to be fully detailed.

To my mind, the main complications about a carrier for RDA have to do with FRBR and how we can most efficiently create relationships between the FRBR entities and manage them within systems. I suspect that we will need to accommodate multiple FRBR scenarios, some appropriate to data storage and others more appropriate to data transmission.

Can We Do Both?

This is my concern: creating a carrier for RDA will not solve the MARC record problem; solving the MARC record problem will not provide a carrier for RDA. There may be a way to combine these two needs, but I fear that a combined solution would end up creating a data format that doesn't really solve either problem because of the significant difference between the AACR conceptual model and that of RDA/FRBR.

It seems that if we want to move forward, we may have to make a break with the past. We may need to freeze MARC for those users continuing to create pre-RDA bibliographic data, and create an RDA carrier that is true to the needs of RDA and the systems that will be built around RDA data, with any future enhancements taking place only to the new carrier. This will require a strategy for converting data in MARC to the RDA carrier as libraries move to systems based on RDA.

Next: It's All About the Systems

In fact, the big issue is not data conversion but what the future systems will require in order to take advantage of RDA/FRBR. This is a huge question, and I will take it up in a new post, but just let me say here that it would be folly to devise a data format that is not based on an understanding of the system requirements that can fulfill desired functionality and uses.

No comments: