Tuesday, January 16, 2007

Comments on D-Lib Article: "RDA... for the 20th c."

Diane Hillmann and I wrote an article called "Resource Description and Access: Cataloging for the 20th Century." It is a critique of what is being developed as the successor to the current library cataloging rules. I'm posting this here primarily to provide a place for comments... so feel free to comment, criticize, add to, or simply vent. Note that I have to approve comments so there will be some delay, and something of an interruption at times due to my ALA schedule.

Also check out the article in the same issue of D-Lib by Karen Markey. One of her points is similar to what Diane and I say, which is that it may be time to de-emphasize descriptive cataloging (at least for regularly published materials) and put that energy into better subject access. Markey suggests adding tables of contents and index terms to records, and developing ranking algorithms to help get the most appropriate material in front of users. Some of what she suggests I would put under the heading of "context" -- categories like reader level vis-a-vis the topic (beginner, expert), general topic area (science, history).

16 comments:

Roy Tennant said...

I thought the piece was wonderful in exposing the direction of the RDA effort and how it will likely not solve the problems that must be solved. I look forward to seeing how I can aid any effort to redirect the work in a profitable direction. Thanks a lot, Karen and Diane!

Anonymous said...

I have only read the article once through - so I apologize - I need a little hand holding to understand some of the issues raised. For example - the example you all give for the "highly structured strings that are clearly not compatible with what we think of today as machine-manipulable data" - What would you rather they have that is compatible?

Karen Coyle said...

Example:

"xvii, 323 p." "1 pamphlet (32p.)" "27 p., 300 leaves" - Fine as a display, but it turns out that total number of pages is a key element to determining editions when doing algorithmic de-duplication. To find out the total number of pages, you have to parse through a quite variable string. And if one cataloger used leaves and the other pages, all bets are off.

Maybe the issue is that the cataloging rules are not supposed to address things like algorithmic de-duping, but reality is that systems do a lot of that and they would do it better if the data were more suited to that task.

One possibility is if we create another standard for markup that is much less text string oriented than MARC, which really was a display format and not a machine-manipulation format. So the cataloging rules could look as above, but coding rules would allow you to create more structured fields like:
enumeration: ix
unit: pages
enumeration: 323
unit: pages
totalEnum: 332
totalUnit: pages

I think the underlying question is: where will we get the data that we need to drive our systems? One example of note is in the MARC "linking fields" which don't really link anything, but it would be great if we really could easily link from a record to a related record. Right now, the user has to figure that out himself by reading through a full record.

Anonymous said...

Two Questions:
The article made me think that MARC was almost as bad as AACR/RDA in the limits for the future. But is it MARC or the application of the rules within MARC? I guess in your example there isn't really a place within MARC to parse out the units. (And my pet peeve is the name field not being parsed apart and only rules/convention has the last name, first name in the x00.) But are rules such as RDA suppose to give specifics of breaking apart of the units - or is it to give a rule that you should give extent? leaving your schema to tell you how to express it in the parsed fields? If it is extent that needs to be expressed, then are you saying that RDA doesn't go far enough to explicitly state how to give the units for computer readable/comparing?

So I was just reading some MARBI document about OCLC wanting a 004 to do some sort of within system linking. All because of RLG data merging into OCLC and the different uses of master records and institutional records. Again, I'm slow on this. Are we discussing that kind of linking? The questions raised at the end of the MARBI document were interesting : http://www.loc.gov/marc/marbi/2007/2007-dp04.html

Karen Coyle said...

Spilsk -

To answer this, we have to look at the history of the cataloging rules and how MARC interacts with them. Because in fact, the answer to your question: "Is it MARC or is it the rules within MARC?" is that both AACR and MARC were essentially functions of the card catalog. AACR was the rules to create a card entry, and MARC was the markup of that entry that allowed the printing of cards. Neither is terribly relevant in today's technology environment. Over the years, MARC has "morphed" somewhat in an attempt to support library systems, but it is saddled with this legacy of AACR and the card that makes it hard for it to respond to today's needs.

I understand the question of whether RDA is supposed to be determining the markup of data elements. I think that a cataloging code could be a list of decisions (what's a title? what defines an edition?). However, RDA goes beyond that and determines transcription of data. A statement like: "record the number in arabic numerals followed by an
appropriate term or terms to indicate the type of unit" goes beyond the decision and steps into the realm of data formatting. There seems to be an assumption in RDA that all of the data elements will be text strings, much like they are today. What we need instead is a recognition that our cataloging data will be part of a machine-actionable record, and that the creation of a catalog is a joint effort of catalogers and systems developers. The latter are entirely lacking from the RDA process.

Anonymous said...

As a person in the trenches, I just need someone to tell me what to record and how to record it so interoperability can happen. Currently, I'm stuck with a MARC system and the AACR2 rev ed rules for what to put in the fields. And still, I have hit problems where my computer record does NOT match someone elses record - Cataloger's judgement effects the matching algorithm?! I want the computer to do so much more with the data that I have and I keep seemingly hitting very hard, brick walls. If I step up (back? over?) to Dublin Core, it seems to be worse with too many loose ends failing to match. So where do I turn? RDA doesn't seem like it is going to fit the bill. MARC in MARCXML doesn't do a whole lot more except actually let me have a valid XML record that might have invalid MARC coding. What is a cataloger/metadataists suppose to do?

Anonymous said...

I thought the piece was well-written and thought-provoking. Do you see a place at all for alphabetical lists or left-justified browsing in future catalogues? at the moment I find these very useful for some searches, but that could be because the data beneath it has been created to support it and current OPACs are sophisted re: keyword search results.

Karen Coyle said...

Alison,

I do use the left-anchored heading browse in a catalog when a keyword search brings up too many records. I use it for its precision when I know the exact heading. My understanding, however, is that few catalog users do a heading browse when keyword searching is also available.

I love shelf browsing in the library, and I think it's a great way to get an education. I don't know of a catalog that allows you to start somewhere and easily browse the virtual shelf, but I'd like to try that. My suspicion is that it won't hold a candle to the real shelf browse experience.

That doesn't really answer your question, I know, it poses another one: what is the purpose of the heading browse? What need does it fulfill? and is it the best way to get users where they want to go?

Anonymous said...

A heading browse that comes to mind is subject browse Caesar, Julius to find books about the man instead of the subject keyword search which will bring up all of the Shakespearean stuff (real-life reference desk example). Perhaps this lack of precision can be fixed by systems that let you narrow by call number (D's vs P's) or with the ability to differ between how the words are tagged (MARC or otherwise) e.g. "Did you mean the person Julius Caesar" or the work Julius Caesar?".

Anonymous said...

I thought this article was great, although I felt rather deflated after reading it. It identified a lot of things I've been thinking about wrt RDA. After so many years of using MARC and applying AACR2 I wonder if we haven't ultimately failed our users. We have difficulty keeping ahead of our print collections evident by the sizable backlogs in most libraries. How can we possibly apply the same thinking to resources in the 'digital age'? I know there is value in what we do but I'm having a hard time seeing the way forward. Library users don't understand what we're doing, in fact many librarians don't either. Maybe we should leave metadata to those who create/use the resources through social tagging, etc. and focus our efforts on identifying relationships between resources that can aid in discovery?

Unknown said...

I found the article to be very well done and thought-provoking. I agree that we need to take a top-down approach *before* we delve into the rules. But--I find few models that we can use to help us define this approach.

I'd like to make another comment, but I want it to be clear that it is not directed personally, but just expresses a reaction I had after reading the article: I read as many articles as I can about these subjects, all written by very intelligent and highly-respected people in and out of the library profession. But, frankly, I am growing weary of the "big questions" being asked all the time without providing any possible solutions. Is it not time to at least *try* an answer? Are we all afraid to go down and road, spend all the time and expense to do so, only to find out we've made the "betamax" decision?

See, even I'm doing it with this post. No answers, just a whole bunch of questions. I feel like the information technology revolution is passing me by while I just try to think about these things.

I became a librarian to provide information to people for their research/personal/professional interests. I believe that technology is allowing for even the most obscure information to be available, and that with the right people in the "virtual" room, advances can actually be made to solve major problems. Now I am using archaic rules to describe the information researchers and others need to solve these problems, and those thinking about changing the rules are unwilling to break out of the past.

So: should I sit back and let the Internet search engine companies or other outside-library-organizations do the work/testing for me, or do I need to actually do something? Better yet: *can* I actually do something??

Unknown said...

I thought the piece was well-done and thought-provoking. Thank you and Diane for publishing the piece.

I agree we need a top down approach to the delivery of information *before* we talk about the rules. I am, however, interested in knowing about models we can consider to have this kind of conversation. I don't think it will be easy. Any ideas from what you've heard or read "out there?"

Karen Coyle said...

One thing that has occurred to me that we should have included in the article is the need to study user behavior rather than to assume we know what the user wants to do at the catalog. It should not be terribly difficult to learn what indexes are used, whether users opt for keyword or heading searching (when they have an option), how often they change a default to something else. And we also need to study users as they approach the catalog to find out where they start, what confuses them, etc. So working with a knowledge of the user rather than a knowledge of cataloging theory would be an interesting approach.

Unknown said...

Well, starting with the user certainly makes sense--but--shouldn't we start with the user NOT at the catalog, but just at the computer? Then see where he/she goes to find information and how they're finding it?

If all these studies are true that the majority of undergraduates are using Google to find things....

Anonymous said...

One reason I don't do heading browsing much in OPACs is I get a worm's eye view of the subject; I only see 20 or so headings, out of what can be hundreds of headings related to a particular subject, and I then have to click to see the items under a heading, one at a time. That's a slow and frustrating experience for most users, certainly when compared to Google-like keyword searches.

I've been experimenting with some alternate subject-oriented displays to make things a bit better. Basically, what I try to do is show *clusters* of related subject headings and items described by those headings together in the same browser view, and try to make it easy for people to navigate to related subjects that appear to be of interest. There are some demos and background material at

http://labs.library.upenn.edu/subjectmaps/

It's not the first system to do clustering, but a lot of the other systems I've seen organize their subject relationships purely in terms of facets or simple hierarchies. I'm trying to find and use a wider range of relationships. The main thing I'm adding at present are LCSH authority-record relationships, but there are other kinds of relationships that could be inferred or created as well. The white paper linked to from the page above has more details.

Anonymous said...

It is sad that it took this long to even realise that both AACR and MARC were essentially functions of the card catalog and that it is severly nadequate for needs of today and tomorrow. It is even more unfortunate that many librarians (blinded by self-importance and denial) are happy with AACR and MARC and think they are still wonderful and not even aware or are unwilling to admit this inadequacy and the urgency for change. And worse, they usually blame the user's lack of knowledge or training of the OPAC if they don't understand the OPAC. Or blame IT or system developers, but never librarians themselves. And librarians are so used to shelf-ready systems that are rigid and applying meaningless rules, rather than being creative, innovative, imaginative, and flexible and create real value in data and content (i.e. data enrichment). So instead of being rules-focused, libraries should genuinely be customer-focused. This means using any available technology (not just Z39.50, which no one but libraries use ; if it was so wonderful others would use it, for sure) to get the best customised mix of technology to meet not only today's needs or but that of tomorrow.