Tuesday, May 19, 2009

LCSH as linked data: beyond "dash-dash"

The SKOS version of LCSH developed by LC has made some choices in how LCSH would be presented in a linked-data format. One of these choices is that the complex headings (which is the vast majority of them) are treated as a single string:

While this might fit appropriately as a SKOS vocabulary, in my opinion it does not work as linked data. I'm going to try to explain why, although it's quite complex. Part of that complexity is that LCSH is itself complex, primarly because there are many exceptions to any pattern that you might care to describe. (For more on this, I suggest Lois Mai Chan's Library of Congress Subject Headings, 4th edition, the chapter on geographic subject headings, pp. 67-89)

Taking the heading above, as I mentioned in my previous post, the geographic term Italy is not in LCSH even though it can indeed be used as a subject heading. Instead, Italy is defined as a name heading in the LC name authorities file. In that file, and only in the name file, alternate forms of the name are included (altLabels, in SKOS terminology):
451 __ |a Repubblica italiana (1946- )
451 __ |a Italian Republic (1946- )
451 __ |a Wlochy
451 __ |a Regno d’Italia (1861-1946)
451 __ |a It?alyah
451 __ |a Italia
451 __ |a Italie
451 __ |a Italien
451 __ |a Italii?a?
451 __ |a Kgl. Italienische Regierung
451 __ |a Ko¨nigliche Italienische Regierung

There are no altLabels in the LCSH entry for Italy--etc. And because the term Italy is buried in an undifferentiated string, there is no linked data way to say that the Italy in Italy--History--1492-1559--Fiction is the same as http://id.loc.gov/authorities/n79021783, which will presumably be the URI for the name.

It is assumed in LC authorities that the altLabels for a name term that appears in a subject heading apply to both the name used as a name and the name used as a subject heading. In the card catalog, where the name alone would appear first in the alphabetical browse of the cards, it was only necessary to make references to that "head" of the list, which would, in our case, be Italy alone. This has caused great problems in online catalogs where searching is by keyword, not a linear alphabetical search. Some systems manage to get around this by doing a string compare to the same subfields in name headings and subject headings, and then transferring the altLabel forms to the related subject headings.
$a Shakespeare, William, $d 1564-1616
$a Shakespeare, William, $d 1564-1616 $v Adaptations $v Periodicals
In this case, the $a and $d subfields represent the same authoritative entity. The rules say that they are, and must be, the same authoritative entity. If they don't match exactly then someone has done something wrong. They are both instances of a name identified as "n 78095332", and which will presumably be given the URI http://id.loc.gov/authorities/n78095332. There is no question about that.

There is also no question that when the name is used in a subject heading it has the full meaning that it is given in the name heading record, including alternate forms of the name and the many notes fields provided by the catalogers that created the authority record. That these don't appear in the LCSH file does not mean that it is not the case: it means only that the LCSH record assumes that the name record exists and provides that information, and that the information is applied to the name in the subject entry through the linear nature of the dictionary catalog.

We musn't confuse the form with the meaning. That LCSH has a rather arrested form is unfortunate, but it was never intended to be used outside of the context of the full set of authorities that gives full treatment to those things that have "proper names." (c.f. Chan, chapter 4)

If we wish for the LC authorities to be used in a linked data environment, then we have to make sure that the linking capabilities are there. Although I agree that each LCSH record has an identifier, and that identifier should be used, I don't agree that what is expressed in the LCSH record is a dumb, undifferentiated string. In this post I have addressed the relation to name headings, but there are other uses of controlled vocabularies within the subject headings that I haven't fully investigated yet.

1 comment:

Unknown said...

I wonder personally why the order of LCSH (keywords to the rest of the world) matters at all. Aren't modern computers able to search entire text blocks without getting hung up on word order? Why doesn't MARC just let us enter as many LCSH approved "keywords" in whatever order we desire?