Monday, October 17, 2011

Relativ index

Most of us, when we hear "Dewey Decimal Classification" (DDC) think about the numbers that go onto the backs of books that then tell us where the book can be found on the library's shelves. The subject classification and its decimal notation was only part of Dewey's invention, however. The other part was the "Relativ Index." The Relativ Index was the entry vocabulary for the classification scheme. It was to be consulted by library users as the way to find topics in the library.
"The Index givs similar or sinonimus words, and the same words in different connections, any any intelijent person wil surely get the ryt number. A reader wishing to know sumthing of the tarif looks under T, and, at a glance, finds 337 as its number. This gyds him to shelvs, to all books and pamflets, to shelf catalog, to clast subject catalog on cards, to clast record of loans, and, in short, in simple numeric order, thruout the whole library to anything bearing on his subject." (Dewey, Edition 11, p. 10) (Yes, that is how he spelled things.)
The most recent version of DDC that I own is from 1922, so this example is an entry in the Relativ Index of Edition 11 under "Leaves:"

Leaves   fertilizers  631.872
              shapes of     botany 581.4

In the schedules these classes are listed as:

631.872 : "Vegetable manures, Leaves" (coming right after "Vegetable manures, Muck").
581.4 : "Morfology   comparativ anatomy"

You can see that the index is not just a repeat of the names of the points in the classification but is a kind of subject thesaurus on its own. It doesn't just point to the classification number but it gives some context ("fertilizers" "botany") to help the user decide which class number to select.

What I find odd today in libraries (mainly public libraries) is that we do not have an entry vocabulary for the Dewey classification. Libraries in the U.S. use the Library of Congress Subject Headings even when their classification scheme is Dewey. While LC subject headings will lead you to a catalog entry that has a classification number, they aren't an index to that classification scheme.

There are more oddities, actually.

One oddity is that we never explain these classification numbers to the users. Yes, I can go from the catalog to the shelf and find books that are near the one I am seeking, but in a small public library I can encounter a number of different topics on a single shelf; and in a large academic library I can wander whole aisles without seeing a change in the initial class number and have no idea if I have exhausted my topic area on the shelf as decimal points three places out change. Yet there is nothing either at the shelf nor anywhere else in the library to tell me what those numbers mean except usually at a very macro level.  What I have before me are book spines and class numbers, and since I don't know what the class numbers mean I have to rely on the spine titles. So if I browse a shelf and see:

364.106 D26f   The first family
364.106 En36h  Havana nocturne
364.106 En36i  The Westies
364.106 En36p  Paddy whacked

... it may not be clear to me what topic I am looking at. At the very least I would like to be able to type "364.106" into an app on my phone and get a display something like:

300  Social sciences
    360  Social problems & social services
        364  Criminology

(That example is truncated because the divisions to the right of the decimal point are not  available to me. Presumably the display would take me down to .106, which would then have something to do with gangs and/or organized crime and/or mafia, but I'm just guessing at that.)

Even better, I'd like to point my phone camera at a book spine and get a similar read-out. Yes, I know that's not going to be simple.

Another oddity is that we put multiple subject headings on a bibliographic record, but only one classification number, reducing the role of classification to simply the ordering of books on the shelves. This means that there are subject headings on the records that would logically lead to class numbers other than the one that has been given.

Using my crime books as an example, the subject headings are clearly more diverse than the single classification code:

    Mafia -- United States -- History
    Mafia -- United States -- Biography
    Criminals -- United States -- Biography
    Organized crime -- United States -- Case studies

    Lansky, Meyer, 1902-
    Luciano, Lucky, 1897-1962
    Mafia -- Cuba -- Havana
    Cuba -- History -- 1933-1959
    Havana (Cuba) -- Social conditions -- 20th century

    Westies (Gang) -- History
    Gangs -- New York (State) -- New York -- History
    Irish American criminals -- New York (State) -- New York -- History
    Hell's Kitchen (New York, N.Y.)

    Organized crime -- United States -- History
    Irish American criminals -- United States -- History
    Gangsters -- United States -- History

This won't be a surprise to my readers, but this dual system is full of "gotchas" for users. If I look up "Irish American criminals" in the subject headings I retrieve some items in 364.106, some in the 920 area (biographies, but many users won't know that), and some in fiction (under the author's last name). It's not that there isn't a rhyme or reason, but there is nothing to explain the differences between these items to the library user that would justify going to three entirely different places in the library to explore this topic. My guess is that the system seems quite arbitrary.

Things are a bit better in libraries that use Library of Congress Classification (LCC) along with LCSH, since the two seem to be developed with some coordination. In his essay "The Peloponneasian War and the Future of Reference" Thomas Mann, of the Library of Congress, explains how LCSH and LCC work together:
"In order to find which areas of the bookstacks to browse, however, researchers need the subject headings in the library catalog to serve as the index to the class scheme. But the linkage between a subject heading and a classification number is usually dependent on the precoordination of multiple facets within the same string. For example, notice the specific linkages of the following precoordinated strings:

Greece–History–Peloponnesian War, 431-404 B.C.: DF229-DF230
Greece–History–19th century: DF803
Greece–History–Acarnanian Revolt, 1836: DF823.6
Greece–History–Civil War, 1944-1949: DF849.5"
This is the correlation that will appear in the LCSH documentation, but this is not what the user sees in the catalog. A search in LC's catalog for Greece-History-19th century brings up books with a variety of classification numbers, the first four being:
DF803 .H45
DF725 .A14
Again, the user is directed to different shelf locations from what seems to be a single subject heading, with no explanation of what these different locations mean.* It's got to be terribly confusing.

Compact notation is essential for the ordering of books on the shelf. But it seems truly odd that we order the books on the shelf but do not tell users what the order means. This can be seen as providing a delightful serendipity, but I presume that we could provide serendipity with less intellectual effort than has been dedicated to DDC and LCC, which are both enormously detailed and growing more so each year in an attempt to encompass the complexity of the published world. How much richer would the user's library experience be if she understood the relationship between the items on the shelf? Does it make sense to create detailed and complex relationships that then are not understood or used? What would a shelf system look like that was meaningful to library users? in a small library? in a large library? And, finally, can we use computing power to overcome to limitations that brought us to the situation we are in today in terms of organized subject access?

* Before someone explains to me that the first subject heading determines the class number... you know that, I know that, but millions of library users have no idea what the order of the subject headings means. Besides, library catalog users often don't see the full record with all of the subject headings. Even in the LC catalog subject headings are not included in the default display. We can't blame the users if they don't know what we don't help them know.

Monday, October 03, 2011

Organizing knowledge

At the LITA forum on Saturday I stated that classification and knowledge organization seem to have fallen off the library profession's radar. (LITA2011 keynote.) We have spent considerable amounts of time and money on making modifications to our cataloging rules (four times in about fifty years), but the discussion of how we organize information for our users has waned. I can illustrate what is at least my impression of this through some searches done against Google Books using its nGram service.

"Library classification" peaks around 1960, and drops off rapidly. (The chart ends at 2000.) 
Library classification

Faceted classification
Faceted classification has a meteoric rise around the 1960's, but falls abruptly from 1970 to 1980. The rise possibly corresponds closely to the activities of the Classification Research Group, based in the UK, whose big interest was in faceted classification.

Decimal Classification
 The decimal classifications, most likely both Dewey and Universal, rise steadily up until the mid-1960's then begin a steep decline.
Keyword searching
 Keyword searching comes along slowly in the 1960's and 70's then takes off from 1980 to 2000. Today, as we know, it's basically the only kind of information retrieval being discussed.
 Knowledge organization also has a steady rise through the 1970's and 80's, and seems to reach a peak that continues up to recent times.

This is hardly a scientific study, but it illustrates what my gut was telling me, which was that keyword searching has essentially replaced any kind of classed access. That does make me wonder what is being discussed under the rubric of "knowledge organization."  Keyword indexing, per se, does not do any organization of knowledge; there are no classes or categories, no broader concepts or narrower concepts, no direction toward similar topics. It also has no facets, at least none based on the topic of the resource, only on its descriptive properties (date of publication, format, domain).

Keyword searching is not organized knowledge. Any topical organization takes place after retrieval by the searcher, who must look through the retrievals and select those that are relevant. This in part explains why Wikipedia is the perfect complement to keyword searches: Wikipedia is organized knowledge. A keyword search can pull up a Wikipedia page that will provide context, disambiguation, and pointers to related topics. I find increasingly that I begin my searches in Wikipedia when my searches are topical, leaving Google to function as my "internet phone book" when I need to find a specific person, company, product or document.

It makes sense for us to ask now: is there any reason (other than shelf placement) to continue library classification practices? Keep your eyes on this space for more about that.

Added note: Richard Urban offers this nGram view comparing all of the library classification phrases with the term "Ontologies":
As @repoRat tweeted: Karen Coyle makes air whoosh out of my lungs. Perhaps classification to be replaced by relationship metadata?  That's a distinct possibility, and we'd better get cracking on that! Many "ontologies" out there today are simple term lists, and few of them seem to have relationships that you can follow productively. What really excites me is the possibility of relationships that we haven't explored in the past, both between concepts and between resources; all of the "based on" "responds to" "often appear together" -- and lots more that my brain isn't sharp enough to even imagine.