... MARC has as much in common with a textual markup language (such as SGML or HTML) as it does with what we might consider to be “structured data.”I have myself often referred to MARC as a markup language, to distinguish it from what a computer scientist would call "data." We took the catalog card and marked it up so that we could store the text in a machine-readable form and re-create the card format as precisely as possible. Along the way, a few fields (publication date, language, format) were considered in need of being expressed as actual data, and so the fixed fields were designed to hold those. Oddly enough, though, in most cases the same information was available in the text, meaning that the information had to be entered twice: once as text, and once as data.
008 pos. 07/10 = 1984This fact is proof that at one point the MARC developers were fully aware that the text in the variable fields was ill-suited to machine operations other than printing on a card (or display on a screen).
260 $c 1984
I have been working off and on for a number of years on an analysis of MARC that is perhaps similar to Thomale's search for the bibliographic data of MARC. I characterize my project as an attempt to define the data elements of the MARC record. The logic goes like this: if we want to create a new, more flexible format for library data, one way to begin that process is to break MARC data up into its data elements. These can then be re-combined using a new data carrier. The converse is that if we cannot break MARC up into its data elements, then any new carrier will surely be saddled with some of the problematic aspects of MARC, such as:
- redundancy, especially the repeat of the same content in many different fields
- inconsistency, where the content in those different fields is coded differently or with a different level of granularity
- potential contradiction between data in fixed fields and textual data
I'm not interested in replicating MARC, so I do not want to create something that is one-to-one with MARC fields and subfields. As an example, some fixed field data elements and their values appear more than once in the MARC format, such as the 008 "Government publication" element which is identical in the 008 for books, computer files, maps, continuing resources and visual materials. As far as I'm concerned it is a single data element. On the other hand, an element named "Color" appears in more than one 007 field, but in each case the values that are valid for the data element are different. These then are different data elements.
I am struggling with how to create usable output from my investigations. I may code some things in the Open Metadata Registry, but at the moment that would have to be done by hand and I need something more automated. I would like to represent the controlled lists in the fixed fields in an RDF-compatible way using SKOS. This should be relatively simple once certain decisions are made (naming, URIs, etc.).
A big question is how to link all of this back to MARC. For the fixed fields it's relatively easy to create a string that represents the MARC origins of the data, for example:
- 007microform05 to represent the data element (field 007, category of material Microform, position 05)
- 007microform05f to represent the actual value (field 007, category of material Microform, position 05, value=f)
Note that there have been other good starts at defining the MARC fixed fields in SKOS, and eventually we may be able to bring this all together. Meanwhile, I did grab marc21.info for the URI portion of this work and obviously am working toward dereferenceable URIs.