Thursday, November 02, 2006


Much of the buzz related to FRBR is its emphasis on relationships. There are the relationships between works, between works and expressions, etc. down the FRBR model through manifestations and items. These are the Group 1 entities in FRBR. Less is said about the Group 2 entities and their "Responsibility" relationships ("is created by Person" "is realized by Corporate Body"). These look a lot like the RDF "triples" that many developers are fond of as semantic organizing principles for data. This is also very similar to the relator codes that we have in the cataloging rules and in MARC21: Smith, Jane, ed.

I have often been frustrated that searches in library catalogs do not allow me to include (or exclude) roles, such as "editor" or "translator." I am annoyed when a search on "Nabokov, Vladimir Vladimirovich, 1899-1977" in the so-called "author" field brings up numerous editions of Lewis Carroll's Alice in Wonderland, translated into Russian by Mr. Nabokov. Yet when I look at the detailed records, in most he his listed simply as an added entry by his full name, but with no relator code. In essence, the catalog has no way to distinguish between works he wrote (or co-wrote, thus the use of the added entry) and those he translated. Unfortunately, it is clear when you look at records in library catalogs that those role codes have not been assigned consistently.

Thom Hickey did a study of relator codes and relator names ($4 v. $e), which he reported in his blog, and came up with the figures below. His interest was in the interaction between the code and the name. Since his study was done in the OCLC WorldCAt catalog, I think it points out that these key roles are not being coded in our records, which essentially results in a lot of false hits for our users. If we can't get these simple relationships coded into our data today, what hope do we have for a relationship-oriented bibliographic view in the future?

Thom's list of top terms:

Relator codes:

prf (1,080,900)
cnd (203,921)
voc (78,921)
itr (77,058)
aut (72,700)
act (56,518)
arr (50,621)
edt (49,205)
trl (43,608)
ill (42,657)

and the top relator terms:

ed (629,083)
joint author (474,307)
ill (214,764)
tr (172,801)
comp (123,239)
printer (60,070)
photographer (45,115)
orient (40,064)
illus (38,201)
former owner (34,892)

From this is seems obvious to me that
  1. These codes are being used primarily in certain cataloging sub-cultures (music and archival works are my best guess)
  2. Some are obviously under-used, in particular "joint author" and "translator"
Here are the added entries for a record for Rainer Werner Fassbinder's film based on Nabokov's novel "Despair":

Bogarde, Dirk, 1921-1999
Ferréol, Andrea
Spengler, Volker
Märthesheimer, Peter, 1938-
Fassbinder, Rainer Werner, 1946-1982
Stoppard, Tom
Nabokov, Vladimir Vladimirovich, 1899-1977. Otchao?a?nie
Löwitsch, Klaus, 1936-

All of these have the same coded relationship to the bibliographic work being described in the MARC21 record: personal added entry.

I very much like the idea of distinguishing roles and making those relationships part of the user experience. But if we aren't taking advantage of the ability we have today, I don't have much hope that we will code these relationships in the future.


Dorothea said...

I think I have a little more hope than you do. It's very hard to justify training cataloguers to do these things, never mind justifying actually DOING them, when there doesn't seem to be any payoff in improved user access.

The strides over the last year in OPAC design make me more hopeful than I have been in -- well, ever -- that we'll finally start making bibliographic data work harder. That, in turn, will make better relationship coding a worthwhile proposition for cataloguers.

Not to harp on Library 2.0, but it strikes me that this is an area where our patrons can help us -- to our benefit as well as theirs. I don't trust a patron with a relator code, but I do trust them to know a translator from an author!

Sherman Clarke said...

In a parallel conversation on the PCCLIST, catalogers are discussing undifferentiated names. One way to differentiate a name is to add a phrase such as "writer on art" or "physician" or "flutist." I sent a message to the list about non-library metadata schemas that use role or relator fields (e.g., VRA Core) but the conversation veered quickly back to the need for differentiated names. I think this is an opac problem, not a next-gen catalog issue. Witness how a couple lines of Google summary can differentiate (pretty well) results on a common name. This doesn't "solve" your Nabokov as translator or author problem.

As to relators in bib records from the recent past, a normal part of authority processing these days strips $e's and $4's from main and added entries. Sigh. We'll want them back but we're getting rid of them now.