Tuesday, August 14, 2007

MarcXchange

It had been announced a while back that folks from a Danish standards body were proposing an ISO standard for an XML version of ISO 2709, which is the ISO standard for what we think of as MARC. I couldn't figure out at the time why an ISO standard was needed since we have MARCXML. I found the draft of the ISO standard (ISO/DIS 25577) online, and learned some important things.

To begin with, I have never seen a copy of ISO 2709, even though the standard is referenced in just about every document that relates to the MARC format. In fact, you often see references to "Z39.2, also known as ISO 2709." Z39.2 is available from the NISO web site, and is the basis for what those of us in the U.S. think of as MARC. So I assumed that ISO 2709 was essentially the same as Z39.2. It turns out that there are some differences that are evidenced in this new standard. They may just be differences in terminology, but here's what shows up in ISO 25577:
  • the "Leader" is called "record label" in ISO 2709
  • the "control fields" (those beginning with "00") are called identifier field and reference fields in ISO 2709
  • what we call "variable" fields in Z39.2 are called "data fields" in ISO 2709
I agree that these may be minor differences, but now I have to go back and try to fix the wikipedia article on ISO 2709. And I have no idea if there are other differences that didn't show up in this particular standards document. I am really annoyed -- no, more than annoyed -- that ISO standards are not open. (And if anyone wants to violate copyright and license and send me a copy of 2709, I will not tell anyone it was you.)

OK, over that hump, the MarcXchange (ISO 25577) is an XML format for ISO 2709. MARCXML is an XML format for MARC21. The difference is the ISO 25577 is much broader than MARCXML. Tags can be anything from 001 to 999 and 00A to ZZZ. And you can have up to nine indicators on a field.

The significance? Well, since you are creating records in XML, certain limitations in the ISO 2709 format do not exist (like field lengths). And you don't have the limitations of MARC21, like limiting tags to 000-999 or having exactly two indicators on every variable field. In this schema, you could create an instance that has no indicators on some fields, and the fields that have indicators wouldn't need to have the same number of them. Think of all of those fields where both indicators have been used and you'd like to add another one. (I don't have the schema in a machine-readable format, but it looks like indicators are limited to one character. I'd love to see that changed so you could have multi-character indicators -- hey, why not?)

No, I'm not advocating that we drop MARC21 for MarcXchange, but could we at least brainstorm on whether MarcXchange could help us out in expanding our bibliographic record where it's needed? No, you couldn't round-trip it, but eventually we have to move forward and quit circling back. Would something like this help us out?

7 comments:

Unknown said...

The text of ISO 2709 is available in Information Transfer, an ISO standards handbook published in 1977 and available in many libraries (OCLC #4189589)

Alex said...

Isn't this - as usual - a question vendors should answer? Most, if not all limitations of MARC21 are there because most if not all library systems are based on MARC21 (including data models, API's and user interfaces).

Karen Coyle said...

Ed, thanks, I'll look for that.

Alexander, the vendors use MARC21 because the library world has declared it "THE standard." So I guess we could consider them enablers of our folly, but we can't blame them for it.

Mike Kreyche said...

It seems pretty clear to me that subfield codes can be more than one character in length. The only restriction is that the subfield code can't be 0 characters in length (which is permitted in ISO-2709).

Karen Coyle said...

Mike,

Yeah, based on ISO 2709 you could define a record with up to 9 characters for subfield codes -- unfortunately, the MARC instance uses only one. But I wonder if subfield codes could vary in an instance of MarcXchange. That seems to offer possibilities, but it also could lead us down the trap of trying to use words or mnemonics for our tags. We humans don't work well with codes that we can't remember, and mnemonics (like having the URL be in the $u in MARC fields) are very helpful.

Mike Kreyche said...

Karen, I'm getting mixed messages about where you stand on mnemonics, but that's another question (I'm ambivalent, myself). I don't have any hands-on experience with schemas, but as far as I can tell with marcXchange, (1) the leader is optional, (2) if it's there, there are data typing constraints consistent with traditional usage, but no function or meaning is assigned to the values as in ISO-2709. So even if you have a leader with "22" for bytes 10 and 11 it seems you could have more or less than two indicators and subfield codes of different lengths. On the whole, it looks OK for Next Gen MARC. Now somebody has to get busy defining those multi-character subfield codes and generating some useful data!

Karen Coyle said...

Mike, as to mnemonics, I understand why they work for humans (easier to remember), and I understand (and have seen) how they box you into a corner if you use them in your data structure. If there's a solution it is to not show your underlying data structure to the humans who are using it, but have a visible display that makes things easy for inputters. It always amazes me that most input to MARC records works directly with the tags and subfield codes, and not something friendlier.