Monday, October 03, 2011

Organizing knowledge

At the LITA forum on Saturday I stated that classification and knowledge organization seem to have fallen off the library profession's radar. (LITA2011 keynote.) We have spent considerable amounts of time and money on making modifications to our cataloging rules (four times in about fifty years), but the discussion of how we organize information for our users has waned. I can illustrate what is at least my impression of this through some searches done against Google Books using its nGram service.

"Library classification" peaks around 1960, and drops off rapidly. (The chart ends at 2000.) 
Library classification

Faceted classification
Faceted classification has a meteoric rise around the 1960's, but falls abruptly from 1970 to 1980. The rise possibly corresponds closely to the activities of the Classification Research Group, based in the UK, whose big interest was in faceted classification.

Decimal Classification
 The decimal classifications, most likely both Dewey and Universal, rise steadily up until the mid-1960's then begin a steep decline.
Keyword searching
 Keyword searching comes along slowly in the 1960's and 70's then takes off from 1980 to 2000. Today, as we know, it's basically the only kind of information retrieval being discussed.
 Knowledge organization also has a steady rise through the 1970's and 80's, and seems to reach a peak that continues up to recent times.

This is hardly a scientific study, but it illustrates what my gut was telling me, which was that keyword searching has essentially replaced any kind of classed access. That does make me wonder what is being discussed under the rubric of "knowledge organization."  Keyword indexing, per se, does not do any organization of knowledge; there are no classes or categories, no broader concepts or narrower concepts, no direction toward similar topics. It also has no facets, at least none based on the topic of the resource, only on its descriptive properties (date of publication, format, domain).

Keyword searching is not organized knowledge. Any topical organization takes place after retrieval by the searcher, who must look through the retrievals and select those that are relevant. This in part explains why Wikipedia is the perfect complement to keyword searches: Wikipedia is organized knowledge. A keyword search can pull up a Wikipedia page that will provide context, disambiguation, and pointers to related topics. I find increasingly that I begin my searches in Wikipedia when my searches are topical, leaving Google to function as my "internet phone book" when I need to find a specific person, company, product or document.

It makes sense for us to ask now: is there any reason (other than shelf placement) to continue library classification practices? Keep your eyes on this space for more about that.

Added note: Richard Urban offers this nGram view comparing all of the library classification phrases with the term "Ontologies":
As @repoRat tweeted: Karen Coyle makes air whoosh out of my lungs. Perhaps classification to be replaced by relationship metadata?  That's a distinct possibility, and we'd better get cracking on that! Many "ontologies" out there today are simple term lists, and few of them seem to have relationships that you can follow productively. What really excites me is the possibility of relationships that we haven't explored in the past, both between concepts and between resources; all of the "based on" "responds to" "often appear together" -- and lots more that my brain isn't sharp enough to even imagine.


Francisca Hernández said...

These same arguments would lead us to dispense with any kind of metadata. I find very useful classification systems to develop tools to link materials without using text strings (Subject Headings) in multilingual and multiscript environments as Linked Open Data (numbers are the only characters used internationally) or to propose users other mechanisms for navigation, for grouping similar conten. DDC and UDC are not very usable, but can be very useful to process and visualize data

Dorothea said...

So, I'll take a flyer at guessing where you're going with this, Karen.

Classification slots materials into one place and one place only, on the sole basis of "aboutness." Are we perhaps moving into a world of explicitly-drawn and implicitly-gathered relationship metadata, where "where something belongs" depends on the criteria by which one is browsing? And where "aboutness" is just one criterion among many possible criteria?

Feel free to laugh at me if I'm completely wrong. :)

Karen Coyle said...

Dorothea --

Great comment! Not necessarily where I was going next, but a logical step for sure. Here are some quick "comment-comments":

Yes, classification is about "aboutness" - but aboutness isn't absolute, it's contextual. Aboutness for a biologist is different to aboutness for a second-grader interested in dinosaurs. So one possibility is that we have an unlimited number of aboutnesses to fit the unlimited number of contexts. How that gets managed from the user's point of view is unclear to me, but the idea that there is one and only one organization of resources is obviously a non-starter (although it's a common approach).

The other thing is that we shouldn't be limited to "about." When Amazon says: you might also like... that's not "about," it's another relationship, not even tightly defined, but still useful. I see linked data as being able to support those other relationships that may be built up over time using data that is gathered (and some old-fashioned social activity).

As a user you should be able to follow topical paths but also use the "other" relationships that exist. These latter may be more serendipitous, and may be most useful when you don't really know what you are looking for or where you want to go from here.

Believe me, this is not an area where I have answers -- it's what I want to explore because what we have today just doesn't seem right to me.

Richard Urban said...


If I understand this correctly, the premis of this post is:
because we see these phrases being used less frequently in google books, the library profession is no longer interested in the concepts behind the phrases.

It seems like there is another explanation, that we are still interested in the concepts, but that we are now using different terminology to discuss it (at least in printed works accounted for by the ngram database). For example, look at the rise of "ontologies," "taxonomies," and other related terms ( Granted, not all of these are about library materials (some are, in fact, about dinosaurs - I'll also grant that these may be broader terms than the ones you've used). Are they not classification systems under a different name (and things that many library professionals are very interested in)?

Karen Coyle said...


I agree with your statement that we might have just changed terminology, which is why I added "knowledge organization" to the mix, and wonder exactly what that means. I do think that some of the semantic web work is heading in the direction of "organized information." However, I still see a large gap in the library profession in this area. There isn't an ontology movement taking place. I don't see anything happening in the area of topical access that is even remotely near the energy that we put into resource description. There aren't active discussions exploring new ideas in topical organization. FRBR and RDA totally ignore this area, and FRSAD seems to be a structure for linking subject headings to descriptive cataloging, not for building an ontology.

I am looking forward to the addition of LC's classification to the list of LC authorities in RDF. Although LCSH was the first vocabulary described in RDF it has little useful structure. I am hoping that sections of LCC will provide interesting playgrounds for linked data. There aren't any really rigorous classification systems in RDF that can compare to it.

Paul Adasiak said...

Just the day before reading this, I discovered that "subject headings" seems to follow a similar pattern: -- except that it peaks in the early 1970s.

John Mark Ockerbloom said...

LCSH is a bit sparse, but I've found it a useful browsing structure when it's filled out some, and browsed through an interface that shows you context and examples. (Example.)

Wikipedia is a lot more comprehensive, both in the size of its concept space and the relationships between them (I've posted previously about Wikipedia as a concept-oriented catalog.) But the relationships aren't so easy to take in and categorize at a glance; and there's not as much connection as there could be between WP and library materials.

It's quite possible to better combine the strengths of catalogs and Wikipedia. Libraries could link out to Wikipedia and other resources from controlled vocabulary terms (as some are starting to do now), and link back to name or subject browsers from appropriate Wikipedia pages. They could also scan Wikipedia for new terms and relationships that could enhance LCSH and relevant bibliographic entries. (Establishing good workflows for this is nontrivial, but conceivable.)

Karen Coyle said...

JMO - great posts on concept-oriented catalog. I did have a moment of wondering if Wikipedia couldn't be our catalog interface... but connecting catalogs and WP seems entirely logical. I think I'll push some of this out to ngc4lib to see what it stirs up.

Diane Hillmann said...

Karen: I'm glad you started this conversation--it follows on with concerns I've had for a long time. I actually think LCC (and potentially Dewey, if so much of it weren't behind paywalls) could be really useful in this regard. LC has been doing interesting stuff with the newer classes (in law, mostly), which knocked my socks off when I first saw them (the document here some of what they're doing). In terms of links and context, it's much more useful than LCSH, in my opinion. Particularly if we can get past the 'mark-it-and-park-it' habits in our collective book-past, I think we might find some really useful possibilities.

Karen Coyle said...

Diane, great stuff in that PDF about linking to LCC based on maps. BTW, I would really like to see more experimentation with FAST, but that, too, seems to have licensing issues. The non-topical facets of LCSH seem to have great potential for linking to a lot of other thesauri because they use the common concepts of time, place, and genre.

Allyson Carlyle said...

In re terminology in current use, I wonder if people are now using the term "controlled vocabulary" more, which encompasses both library classification as well as subject vocabularies such as LCSH.

Also, isn't keyword searching replacing other types of search because it is usually the default (or only) option for searching? I'm not remembering if any research has been done on this, but in my personal experience, users search using whatever is default.

Following up on Diane, when someone (vendor?) figures out how to use DDC & LCC information in truly creative ways, I think we'll see more discussion & use. It will come, I hope. Of course, I've already been waiting about 20 years or so, but I haven't given up yet.