Sunday, November 04, 2007

Our subject mess

Lately I've had occasion to work with a few different groups of people who are delving into library bibliographic data for the first time. Believe me, it is quite revealing to view it from the viewpoint of these novices. Novices only in this one area, because they generally are quite savvy about computing and data. Each new revelation gives me a chance to regale them with an amusing story about "how it got that way." I can explain (note: explain, not justify) why we have no identifiers for key elements like authors and works. I can pretty much explain why we seem more concerned about the package than the content. I can reminisce about moments in the history of library systems development that happened before some members of these groups were born. But I get totally stuck when they point out the mess that is our subject access.

We have two classification systems, Dewey (DDC) and Library of Congress. (LCC) That in itself is not a problem, and it's fairly easy to explain how they developed in different contexts, always making sure to explain that these systems classify the items in a library, not the world of thought.

What is hard is to try to explain what either of them has to do with the Library of Congress Subject Headings.(LCSH) Many folks assume that LCSH is the entry vocabulary into LCC. Thus if there is a classification code in a record that stands for "vocal music, choruses" that there will be a heading in the record that is "vocal music, choruses," and vice versa. They also assume that the two subject systems (classification and subject headings) have the same structure, which would mean that you can "drill down" from music to vocal music then to choruses in either or both. Nothing could be further from the truth. So it is quite confusing to them when they see a record with a call number that would ostensibly be about "vocal music, choruses" based on the classification, but instead the subject heading is "Cantatas, Secular -- Scores." And they are equally confused when the record has another subject heading ("Funeral music") but only the one classification number.

I can't explain this disconnect between the subject headings and the classification scheme, except to say: that's how it is.

Recently, I was browsing through my beloved copy of the DDC from 1899 that still has both its numeric and alphabetical tabs relating respectively to the classification and the "Relativ Subject Index." The RSI is indeed an index to the classification scheme, and it appears that Dewey originally intended it also as the access to the collection:
Find the subject desired in its alphabetical place in the index. The number after it is its class number and refers to the place where the topic will be found, in numerical order of class numbers, on the shelves or in the subject catalog."
From this I can only presume that the shelves and the subject catalog were in classification order, and the alphabetical index was the index to that classification. I can only guess at this point, from what he says here, that the subject catalog was in classification order, as is the shelf, but also contained the verbal translation of what the decimal classification numbers meant.
"Under this class number will be found the resources of the library on the subject desired. Other subjects near the one sought may often be consulted with profit; e.g., Communism is the topic wanted and the index refers to 335.4, but 335, Socialism, and even the inclusive division 330, Political economy, also contain much on this subject. The reverse is equally true; the full material on socialism can only be had by looking at its divisions 335.3, Fourierism, 335.4, Communism, etc. The topics which are thus subdivided are plainly marked in the index by heavy faced type."
My copy is #3933, originally owned by the Roger Williams Park Museum in Providence, Rhode Island. The current incarnation of the institution appears to be the Museum of Natural History and Planetarium. My copy has many penciled notes in the area of Zoology (DDC 590), which would fit the natural history nature of the institution. (I don't see any evidence of a current library.) By 1900 the "dictionary catalog" would have taken root, so I don't know if the library would have followed Dewey's instructions for the creation of a classified catalog. But I do wonder how we got from a single system that had an alphabetical index to a classification system to a system with an alphabetical index and two classification systems, but in which the index and the classification have essentially each gone their own ways. This is obviously a gap in my education, which I will gladly rectify if you have suggestions for readings.

Meanwhile, no wonder users are confused.


Irvin said...

I think after a few years we forget how complicated and illogical cataloguing can appear to a novice. I remember being shocked the first time I looked at AACR by what a dog's breakfast the whole thing seemed to be and how impenetrable the language was. I think people lose sight of that when criticising the terminology in the RDA drafts.

In Australia most libraries use DDC as the classification scheme and LCSH as subject headings so the dislocation between the two is second nature to us.

I remember being told that classified catalogues were more popular in Britain and Europe than the US. One of my first library jobs was at the Royal Botanic Gardens in Sydney where they still had a classified (DDC) card catalogue. The botanists would usually go to it in preference to the new-fangled online version.

- Irvin

Nathan said...

So it is quite confusing to them when they see a record with a call number that would ostensibly be about "vocal music, choruses" based "on the classification, but instead the subject heading is "Cantatas, Secular -- Scores." And they are equally confused when the record has another subject heading ("Funeral music") but only the one classification number.

I can't explain this disconnect between the subject headings and the classification scheme, except to say: that's how it is."

Karen, while not totally excusing the messiness that is our classifications and subject headings, I humbly submit that its because this world is so messy that our maps are so messy. :)

I think we need to acknowledge the realites from "conflicting veiws" taking bits and pieces from rationalism, empiricism, romantism, etc. - and more traditional thought - in order to make maps of the world. Its a messy process.

Re: this above, this article recently hit me in the face:

As to your request for reading, it seems to me that this paper by Jonathan Rochkind might be a good starting point (at least the first few pages):

Karen Coyle said...

Nathan, the tinyurl didn't work -- re-create?

In the 70's I was very interested in classification. I wrote an article on faceted classification in '74 or so, and even went to the UK and talked to some of the folks there who were really the forefront of classification theory. At the time I was totally steeped in Vickery, Lancaster, et al. So that's covered.

What I don't know is how LCSH got started, and what the original theory was behind it. Unlike Dewey, there doesn't seem to have been an evangelist for LCSH, perhaps because it was designed as a system just for LC?

The people who find our methods messy seem to be those who would like a systematic way to explore topics. That's not everyone. I myself am drawn to connections between topics and like to explore. Some people want to type in a keyword and get something. Those are two different search strategies (and not mutually exclusive). I find it odd that we don't provide more for the explorers. It doesn't have to be a hierarchical view of the world -- topic maps of a non-hierarchical nature might be just fine. But we've totally atomized the world of searching, and I think we are not serving all of our users with that view.

jrochkind said...

In a paper I wrote in library school, I quoted Dewey:

<< Dewey (1876) interestingly claimed that his classification system “was devised for cataloguing and indexing purposes,” but was “found on trial to be very valuable for numbering and arranging books and pamphlets on the shelves.” This second “accidental” use is indeed what the DDC had come to be maintained exclusively for, and despite the continued existence of the alphabetic ‘relative’ index to the classification... >>

Dewey, Melvil. (1876). Catalogues and Cataloguing: A Decimal Classification and Subject Index. In Public Libraries in the United States of America: Their History, Condition, and Management, Special Report, Part I. Department of the Interior, Bureau of Education. Washington DC: Government Printing Office. 623-643.


So, yeah. But yeah, it's a mess right now. All of our subject/topic/form/genre vocabularies were set up for a very particularly environment, where the main way you could get access to things was by alphabetically or numerically arranged lists. Or arrangements of physical objects on a shelf. Either way, it was assumed that users would be consulting _lists_, organized by alphabetization or numeric order.

Now we have a bunch of other ways we want to access things, and these vocabularies, optimized so well for that environment, are failing us. Facetted displays are one example. Attempting to let people browse or limit by form/genre is another (count the number of places form/genre info may appear in a MARC catalog, from differnet non-compatible vocabularies. LCSH, LCC, Dewey, GMD, SMD, MARC fixed field codes, probably a few more.)

jrochkind said...

"The people who find our methods messy seem to be those who would like a systematic way to explore topics."

I would add that the whole premise of the facetted view of LCSH many people are experimenting with is this kind of systematic exploration. If LCSH is not suitable for systematic exploration---what does it mean that we are providing facetted display of it, what use are those facetted displays, what incorrect assumptions do they lead users to have about what they are looking at? And what would be the damage to a user's search/exploration of these incorrect assumptions?

Nathan said...


Not sure why its not creating.



"The traditional future" by Brantley

Samuel said...

If I may write something about...

By reading the post+comments, it reminds me a book I gone through, many years ago: Classification in the 1970s / Arthur Maltby. If my memory is OK, some contributions inside this book (I mean, the first edition dated 1972) mention the relationships between shelf-based classifications and its later subject consistency and development in classes or disciplines... Well, it would not be difficult to make some analogies, today, between dispersed/organized webpages and legitimate 'semantic efforts' of all kinds...

Alphabetic 'devices' probably have crossed many stages between oral and visual cultures since we began to find alphabetic entries replacing former iconic representations, as numerals seemed to replace linguistic barriers and to subsume concepts by the very use of (alpha)numeric 'classification schemes' and related arrangements... (In this way I consider, for example, the book: Orality and Literacy / Walter Ong, chapter 4)

Nowadays it seems more easier to combinate alphabetic, numerical or iconic elements in our everyday communications -- as in the case of Internet smileys (older and new ones) :o) -- but classification/indexing would imply some philosophical attitude towards knowledge which seems always rewarding to be revisited from time to time... As Michael Gorman points out, critical thinking is to be kept in our future library instruction agendas (Our Enduring Values, chapter 7).

Best wishes!

Jerome said...

Lois May Chan has a brief history of the origins of LCSH up at the LC site. It was apparently the shift to a dictionary catalog with author/title/subject headings in 1898 that led to the creation of LCSH , and they took the existing ALA subject headings as a starting point, knowing that it would have to be expanded to handle the scope of LC's collections. But that just pushes back the question to 'why did ALA feel the need for a subject description scheme separate from a classification scheme. Julia Pettee published a book in 1946, "Subject Headings: The History and Theory of the Alphabetical Subject Approach to Books" that might be enlightening. There was also an issue of Cataloging & Classification Quarterly (v. 29, nos. 1/2) devoted to LCSH (sort of a centennial celebration) which includes an article by Alva Stone on the history of LCSH.

It seems to me that the switch from classified catalogs to dictionary catalogs, combined with the effort to provide multiple subject access points for a text in order to improve users' ability to locate materials in public libraries in the U.S., is what really drove the division between subject headings and classification languages. So I think the origins of the split antedate LCSH by a few decades. As soon as you allowed multiple subject headings for a book in order to improve recall, you would inevitably find that the nice clean mapping between subject headings and a classification structure was broken.

It's possible the split even predates that point. I don't know enough about the early subject indexes into classified catalogs to know whether they support a subject index entry to point to multiple classifications. If so, the split between subject languages and classification systems can probably be traced back quite a ways.

Karen Coyle said...

Jerome, thanks. This was what I was imagining, but now you've given me a good starting place for following the history. Much appreciated.

Alain Pierrot said...

If I may add an external view, the subject “mess” can be “justified” (not only “explained”):

If books do make sense to readers, it is through different purposes and approaches, different considerations.

Imagine that DDC and LCC would structure a collection exactly in the same homothetic way: it would merely represent a change of "vocabulary", exactly as you can represent a number in binary, decimal or hexadecimal base, without any added sense.

That two classifications are not homothetic reflect the fact that one book does make different senses for different readers (or the same reader at different times). It is an enrichment, not a mess, to allow different non-homothetic classifications, provided that consistency in each classification is maintained.

Karen Coyle said...

Alain said: "If books do make sense to readers, it is through different purposes and approaches, different considerations."

Yes, but I don't think that's what we're providing. I don't mind that we have Dewey and LCC. Those are indeed different approaches, each legitimate. The problem is that we have no entry into either of them for the users. So they search using LCSH and are sent to the shelf, which is organized by Dewey or LCC, but there isn't a way to use LCSH to mine the richness of LCC or Dewey. There is nothing that would explain to you what the shelf order is. Some people have told me that standing in front of a shelf of books the order is obvious. I'm afraid I don't always find that to be the case. After all, in a small library you can find the computer books ("Excel for Dummies") on the same shelf as general culture ("Weimar culture: the outsider as insider"), followed shortly thereafter by "The big book of hoaxes" and "Ponzi schemes." The next shelf has books on religion. I'd be more comfortable if we had a displayable form of the meaning of the classification codes, and if users who wished to use that system for finding could do so, beginning with an entry vocabulary.

So I guess it's the disconnect between classification and subject headings that bothers me. I definitely would be thrilled if we had a multitude of well-functioning subject systems for people to choose from. I just don't think that we do.

Alain Pierrot said...

Sorry, I hadn't got the point!

It might be interesting to see if tools such as Elastic Lists could provide useful connections?

Karen Coyle said...

Alain, no need to apologize. I took a quick look at Elastic Lists -- thanks for the link. It looks a lot like facets, and I think that both DDC and LCC have some faceting aspects to them. You would probably have to drill down some levels before you could apply facets. Well, it'll be something to keep a watch on.

Antony Gordon said...

This looks like a sad case of boats long missed. Ranganathan's work on facets caught on rather more in Europe than the US and indeed at one time the British National Bibliography went so far as to add facets to DDC numbers that were at the time seen as inadequate -- I think it was edition 14. BNB later abandoned that giving straight DDC numbers. For quite some time it also used an advanced cyclic indexing system invented by Derek Austin known as PRECIS but alas also later abandoned that in favour of LCSH! There was a tight link between PRECIS terms and DDC numbers.

Ranganathan's own Colon Classification was in its time the only fully faceted general classification and was capable also of what Ranganathan termed chain indexing to provide a precise alphabetic link to the classified order. It's still I believe in wide use in India but never caught on elsewhere. The Universal Decimal Classification (UDC) originally based on an early ed of DDC has diverged substantially with substantial faceting that is presumably capable to a reasonable extent of being chain-indexed. I guess the Bliss Classification current ed which is I believe fully facetted would also be similarly capable.

PRECIS was probably ahead of its time in that the manipulation codes necessary could have been more easily applied with current technology.