Wednesday, April 12, 2017

If It Ain't Broke

For the first time in over forty years there is serious talk of a new metadata format for library bibliographic data. This is an important moment.

There is not, however, a consensus within the profession on the need to replace the long-standing MARC record format with something different. A common reply to the suggestion that library data creation needs a new data schema is the phrase: "If it ain't broke, don't fix it." This is more likely to be uttered by members of the cataloging community - those who create the bibliographic data that makes up library catalogs - than by those whose jobs entail systems design and maintenance. It is worth taking a good look at the relationship that catalogers have with the MARC format, since their view is informed by decades of daily encounters with a screen of MARC encoding.

Why This Matters

When the MARC format was developed, its purpose was clear: it needed to provide the data that would be printed on catalog cards produced by the Library of Congress. Those cards had been printed for over six decades, so there was no lack of examples to use to define the desired outcome. In ways unimagined at the time, MARC would change, nay, expand the role of shared cataloging, and would provide the first online template for cataloging.

Today work is being done on the post-MARC data schema. However, how the proposed new schema might change the daily work of catalogers is unclear. There is some anxiety in the cataloging community about this, and it is understandable. What I unfortunately see is a growing distrust of this development on the part of the data creators in our profession. It has not been made clear what their role is in the development of the next "MARC," not even whether their needs are a driving force in that development. Surely a new model cannot be successful without the consideration (or even better, the participation) of the people who will spend their days using the new data model to create the library's data.

(An even larger question is the future of the catalog itself, but I hardly know where to begin on that one.)

If it Ain't Broke...

The push-back against proposed post-MARC data formats is often seen as a blanket rejection of change. Undoubtedly this is at times the case. However, given that there have now been multiple generations of catalogers who worked and continue to work with the MARC record, we must assume that the members of the cataloging community have in-depth knowledge of how that format serves the cataloging function. We should tap that knowledge as a way to understand the functionality in MARC that has had a positive impact on cataloging for four decades, and should study how that functionality could be carried forward into the future bibliographic metadata schema.

I asked on Twitter for input on what catalogers like about MARC, and received some replies. I also viewed a small number of presentations by catalogers, primarily those about proposed replacements for MARC. From these I gathered the following list of "what catalogers like about MARC." I present these without comment or debate. I do not agree with all of the statements here, but that is no matter; the purpose here is to reflect cataloger perspectives.

(Note: This list is undoubtedly incomplete and I welcome comments or emails with your suggestions for additions or changes.)

What Catalogers Like/Love About MARC

There is resistance to moving away from using the MARC record for cataloging among some in the Anglo-American cataloging community. That community has been creating cataloging data in the MARC formats for forty years. For these librarians, MARC has many positive qualities, and these are qualities that are not perceived to exist in the proposals for linked data. (Throughout the sections below, read "library cataloging" and variants as referring to the Anglo-American cataloging tradition that uses the MARC format and the Anglo-American Cataloging Rules and its newer forms.)

MARC is Familiar

Library cataloging makes use of a very complex set of rules that determine how a resource is described. Once the decisions are made regarding the content of the description, those results are coded in MARC. Because the creation of the catalog record has been done in the MARC format since the late 1970's, working catalogers today have known only MARC as the bibliographic record format and the cataloging interface. Catalogers speak in "MARC" - using the tags to name data elements - e.g. "245" instead of "title proper".


Those who work with MARC consider it to be "human readable." Most of the description is text, therefore what the cataloger creates is exactly what will appear on the screen in the library catalog. If a cataloger types "ill." that is what will display; if the cataloger instead types "illustrations" then that is what will display. In terms of viewing a MARC record on a screen, some cataloger displays show the tags and codes to one side, and the text of those elements is clearly readable as text.

MARC Gives Catalogers Control

The coding is visible, and therefore what the cataloger creates on the screen is virtually identical to the machine-readable record that is being created. Everything that will be shown in the catalog is in the record (with the exception of cover art, at least in some catalogs). The MARC rules say that the order of fields and subfields in the record are the order in which that information should be displayed in the catalog. Some systems violate this by putting the fields in numeric order, but the order of subfields is generally maintained. Catalogers wish to control the order of display and are frustrated when they cannot. In general, changing anything about the record with automated procedures can un-do the decisions made by catalogers as part of their work, and is a cause of frustration for catalogers.

MARC is International

MARC is used internationally, and because the record uses numerics and alphanumeric codes, a record created in another country is readable to other MARC users. Note that this was also the purpose of the International Standard Bibliographic Description (ISBD), which instead of tags uses punctuation marks to delimit elements of the bibliographic description. If a cataloger sees this, but cannot read the text:

  245 02   |a לטוס עם עין אחת / |c דני בז.

it is still clear that this is a title field with a main title (no subtitle), followed by a statement of the author's name as provided on the title page of the book.

MARC is the Lingua Franca of Cataloging

This is probably the key point that comprises all of the above, but it is important to state it as such. This means that the entire workflow, the training materials, the documentation - all use MARC. Catalogers today think in MARC and communicate in MARC. This also means that MARC defines the library cataloging community in the way that a dialect defines the local residents of a region. There is pride in its "library-ness". It is also seen as expressing the Anglo-American cataloging tradition.

MARC is Concise

MARC is concise as a physical format (something that is less important today than it was in the 1960s when MARC was developed), and it is also concise on the screen. "245" represents "title proper"; "240" represents "uniform title"; "130" represents "uniform title main entry". Often an entire record can be viewed on a single screen, and the tags and subfield codes take up very little display space.

MARC is Very Detailed

MARC21 has about 200 tags currently defined, and each of these can have up to 36 subfields. There are about 2000 subfields defined in MARC21, although the distribution is uneven and depends on the semantics of the field; some fields have only a handful of subfields, and in others there are few codes remaining that could be assigned.

MARC is Flat

The MARC record is fairly flat, with only two levels of coding: field and subfield. This is a simple model that is easy to understand and easy to visualize.

MARC is Extensible

Throughout its history, the MARC record has been extended by adding new fields and subfields. There are about 200 defined fields which means that there is room to add approximately 600 more.

MARC has Mnemonics

Some coding is either consistent or mnemonic, which makes it easier for catalogers to remember the meaning of the codes. There are code blocks that refer to cataloging categories, such as the title block (2XX), the notes block (5XX) and the subject block (6XX). Some subfields have been reserved for particular functions, such as the use of the numeric subfields in 0-8. In other cases, the mnemonic is used in certain contexts, such as the use of subfield "v" for the volume information of series. In other fields, the "v" may be used for something else, such as the "form" subfield in subject fields, but the context makes it clear.

There are also field mnemonics. For example, all tagged fields that have "00" in the second and third places are personal name fields. All fields and subfields that use the number 9 are locally defined (with a few well-known exceptions).

MARC is Finite and Authoritative

MARC defines a record that is bounded. What you see in the record is all of the information that is being provided about the item being described. The concept of "infinite graphs" is hard to grasp, and hard to display on a screen. This also means that MARC is an authoritative statement of the library bibliographic description, whereas graphs may lead users to sources that are not approved by or compatible with the library view.


Sebastian Hammer said...

A thought-provoking piece, Karen. It seems to me that any LD-based cataloging ecosystem, to be effective, will have to provide a much more mediated cataloging experience than what people may be used to from mostly manual MARC-based input systems. I've envisaged environments that would suggest links to external entities from trusted sources, but would allow them the final say. But it will still be different than the 'traditional', self-contained and hands-on workflow, and it will depend much more upon the software one chooses to do the mediation. At any rate, a loss of control for the individual cataloger is almost inevitable.

I feel like lots of catalogers are curious about emerging technologies and keen to see their field evolve. But I also think that discussions between catalogers and engineers often run aground because of the peculiar, historical dual-role of MARC as an exchange format and a human, descriptive language for catalogers. Any conversation that takes as its starting point only the needs and desires of the cataloger is likely to come to a sticky end (and I say that as an unapologetic fan of MARC for the amazing past accomplishment that it represents). You have to ask, to what end is the cataloging being done? What functions within the library are served by the cataloging activity, and what larger purposes in the shared space between libraries are we looking to better address. If our only purpose were to avoid annoying people who're comfortable with cataloging in MARC, we'd probably best just stay out of the way. Don't fix it. :-)

I think the reality is, if MARC is kicked to the curb, no single thing will replace it... we'll be faced with a much messier universe of competing standards and proprietary or regional approaches -- often lowered standards based on what's good enough for basic discovery or what can be easily repurposed from upstream sources. The results will be increased interoperability friction, and only the largest players like Google and OCLC will be able to make sense of it all at a global level. That may be okay, but we should go there with our eyes open. Sometimes, replacing a really healthy standard with a new and improved one doesn't make the world better, it just makes it messier because now you've got Yet Another Standard and you've diluted the imperative for anyone to support either one.

I can understand that catalogers are frustrated with the present uncertainty. As a small software developer, I find the current landscape really confusing, with no clear future path in sight.

Karen Coyle said...

Hi, Sebastian. I just tweeted a statement by Henriette Avram, the genius behind MARC: "Any discussion about MARC in particular involves discussing library automation in general." That's from 1974. If you look at my talk "Mistakes Have Been Made" you'll see that I don't think we've had that discussion about "library automation in general."

I don't entirely agree about having a wider range of data standards. We have a pretty wide range already between libraries, archives, and various partners - MODS, Dublin Core, EAD, ONIX etc. We seem to function ok in spite of this. I do think that the "sharing economy" of libraries will require some consistency of data just as a practical matter. It makes sense to me to build around a core (as BIBFRAME Lite is suggesting). I end up thinking about the beauty of Dewey's decimal system, where you can lop off unneeded digits and still have a coherent subject.

Oh, and at some point we need to talk about keeping thousands of copies of records in separate databases where they cannot be easily updated in a coordinated fashion. I bet you've got thoughts on that one.

Christina Neigel said...

Thanks for this. As someone who has taught library technicians for many years, I have found the rollout of changes to cataloguing practices incredibly unfair to those who will be expected to do the majority of the work because education and training had been treated as an afterthought when it needs to be part of the pillars of meaningful and effective change.

Without adequate access to proper training, even educators have found the transition to RDA difficult. Particularly because the the tools assume a level of knowledge about cataloguing practice that students in two year programs do not have. I always felt these changes were decided by an elite and were not inclusive decisions, violating the very basis of LIS values. By not checking in widely with those who do the daily work of cataloguing, changes to standards run the risk of losing touch with the harsh realities of practice and jeopardize the very essence of resource sharing and ability for local control that make cataloguing effective.
The tools for training are ridiculously expensive and clumsy to teach with. I am disappointed I the ways these changes have been implemented and the apathy of those instigating change towards those of us who have asked for help and consideration when trying to teach a new generation of library workers.

Christina Neigel

Karen Coyle said...

Thanks, Christina. I think our standards process is pretty broken - not much consultation, very top-down, and dominated by a few institutions. I'm going to mention this in my next post, but it deserves more than I can give it, since I'm not in an educator position. I'm glad you posted this comment; it needs to be heard.

Karen Coyle said...

Said to me today at Dublin Core 2017 meeting: MARC shows you what is under the hood. Because of that people creating data have a better idea of the technology that is being used for the data.