Monday, July 04, 2016

Catalogs and Content: an Interlude

This entire series is available a single file on my web site.

"Editor's note. Providing subject access to information is one of the most important professional services of librarians; yet, it has been overshadowed in recent years by AACR2, MARC, and other developments in the bibliographic organization of information resources. Subject access deserves more attention, especially now that results are pouring in from studies of online catalog use in libraries."
American Libraries, Vol. 15, No. 2 (Feb., 1984), pp. 80-83
Having thought and written about the transition from card catalogs to online catalogs, I began to do some digging in the library literature, and struck gold. In 1984, Pauline Atherton Cochrane, one of the great thinkers in library land, organized a six-part "continuing education" to bring librarians up to date on the thinking regarding the transition to new technology. (Dear ALA - please put these together into a downloaded PDF for open access. It could make a difference.) What is revealed here is both stunning and disheartening, as the quote above shows; in terms of catalog models, very little progress has been made, and we are still spending more time organizing atomistic bibliographic data while ignoring subject access.

The articles are primarily made up of statements by key library thinkers of the time, many of whom you will recognize. Some responses contradict each other, others fall into familiar grooves. Library of Congress is criticized for not moving faster into the future, much as it is today, and yet respondents admit that the general dependency on LC makes any kind of fast turn-around of changes difficult. Some of the desiderata have been achieved, but not the overhaul of subject access in the library catalog.

The Background

If you think that libraries moved from card catalogs to online catalogs in order to serve users better, think again. Like other organizations that had a data management function, libraries in the late 20th century were reaching the limits of what could be done with analog technology. In fact, as Cochrane points out, by the mid-point of that century libraries had given up on the basic catalog function of providing cross references from unused to used terminology, as well as from broader and narrower terms in the subject thesaurus. It simply wasn't possible to keep up with these, not to mention that although the Library of Congress and service organizations like OCLC provided ready-printed cards for bibliographic entries, they did not provide the related reference cards. What libraries did (and I remember this from my undergraduate years) is they placed near the card catalog copies of the "Red Book". This was the printed Library of Congress Subject Heading list, which by my time was in two huge volumes, and, yes, was bound in red. Note that this was the volume that was intended for cataloging librarians who were formulating subject headings for their collections. It was never intended for the end-users of the catalog. The notation ("x", "xx", "sa") was far from intuitive. In addition, for those users who managed to follow the references, it pointed them to the appropriate place in LCSH, but not necessarily in the catalog of the library in which they were searching. Thus a user could be sent to an entry that simply did not exist.

The "RedBook" today
From my own experience, when we brought up the online catalog at the University of California, the larger libraries had for years had difficulty keeping the card catalog up to date. The main library at the University of California at Berkeley regularly ran from 100,000 to 150,000 cards behind in filing into the catalog, which filled two enormous halls. That meant that a book would be represented in the catalog about three months after it had been cataloged and shelved. For a research library, this was a disaster. And Berkeley was not unusual in this respect.

Computerization of the catalog was both a necessary practical solution, as well as a kind of holy grail. At the time that these articles were written, only a few large libraries had an online catalog, and that catalog represented only a recent portion of the library's holdings. (Retrospective conversion of the older physical card catalog to machine-readable form came later, culminating in the 1990's.) Abstracting and indexing databases had preceded libraries in automating, DIALOG, PRECIS, and others, and these gave librarians their first experience in searching computerized bibliographic data.

This was the state of things when Cochrane presented her 6-part "continuing education" series in American Libraries.

Subject Access

The series of articles was stimulated by an astonishingly prescient article by Marcia Bates in 1977. In that article she articulates both concerns and possibilities that, quite frankly, we should all take to heart today. In Lesson 3 of Cochrane's articles, Bates is quotes from 1977 saying:
"...with automation, we have the opportunity to introduce many access points to a given book. We can now use a subject approach... that allows the naive user, unconscious of and uninterested in the complexities of synonymy and vocabulary control, to blunder on to desired subjects, to be guided, without realizing it, by a redundant but carefully controlled subject access system." 
and
"And now is the time to change -- indeed, with MARC already so highly developed, past time. If we simply transfer the austerity-based LC subject heading approach to expensive computer systems, then we have used our computers merely to embalm the constraints that were imposed on library systems back before typewriters came into use!"

This emphasis on subject access was one of the stimuli for the AL lessons. In the early 1980's, studies done at OCLC and elsewhere showed that over 50% of the searches being done in the online catalogs of that day were subject searches, even those going against title indexes or mixed indexes. (See footnotes to Lesson 3.) Known item searching was assumed to be under control, but subject searching posed significant problems. Comments in the article include:
"...we have not yet built into our online systems much of the structure for subject access that is already present in subject cataloging. That structure is internal and known by the person analyzing the work; it needs to be external and known by the person seeking the work."
"Why should a user ever enter a search term that does not provide a link to the syndetic apparatus and a suggestion about how to proceed?"
Interestingly, I don't see that any of these problems has been solved into today's systems.

As a quick review, here are some of the problems, some proposed solutions, and some hope for future technologies that are presented by the thinkers that contributed to the lessons.

Problems noted

Many problems were surfaced, some with fairly simple solutions, others that we still struggle with.
  • LCSH is awkward, if not nearly unusable, both for its vocabulary and for the lack of a true hierarchical organization
  • Online catalogs' use of LCSH lacks syndetic structure (see, see also, BT, NT). This is true not only for display, but in retrieval, search on a broader term does not retrieve items with a narrower term (which would be logical to at least some users)
  • Libraries assign too few subject headings
  • For the first time, some users are not in the library while searching so there are no intermediaries (e.g. reference librarians) available. (One of the flow diagrams has a failed search pointing to a box called "see librarian" something we would not think to include today.)
  • Lack of a professional theory of information seeking behavior that would inform systems design. ("Without a blueprint of how most people want to search, we will continue to force them to search the we want to search." Lesson 5)
  • Information overload, aka overly large results, as well as too few results on specific searches

Proposed solutions

Some proposed solutions were mundane (add more subject headings to records) while others would require great disruption to the library environment.
  • Add more subject headings to MARC records
  • Use keyword searching, including keywords anywhere in the record.
  • Add uncontrolled keywords to the records.
  • Make the subject authority file machine-readable and integrate it into online catalogs.
  • Forget LCSH, instead use non-library bibliographic files for subject searching, such as A&I databases.
  • Add subject terms from non-library sources to the library catalog, and/or do (what today we call) federated searching
  • LCSH must provide headings that are more specific as file sizes and retrieved sets grow (in the document, a retrieved set of 904 items was noted with an exclamation point)

Future thinking 

As is so often the case when looking to the future, some potential technologies were seen as solutions. Some of these are still seen as solutions today (c.f. artificial intelligence), while others have been achieved (storage of full text).
  • Full text searching, natural language searches, and artificial intelligence will make subject headings and classification unnecessary
  • We will have access to back-of-the-book indexes and tables of contents for searching, as well as citation indexing
  • Multi-level systems will provide different interfaces for experts and novices
  • Systems will be available 24x7, and there will be a terminal in every dorm room
  • Systems will no longer need to use stopwords
  • Storage of entire documents will become possible

End of Interlude

Although systems have allowed us to store and search full text, to combine bibliographic data from different sources, and to deliver world-wide, 24x7, we have made almost no progress in the area of subject access. There is much more to be learned from these articles, and it would be instructive to do an in-depth comparison of them to where we are today. I greatly recommend reading them, each is only a few pages long.

----- The Lessons -----

*Modern Subject Access in the Online Age: Lesson 1
by Pauline Atherton Cochrane
Source: American Libraries, Vol. 15, No. 2 (Feb., 1984), pp. 80-83
Stable URL: http://www.jstor.org/stable/25626614

*Modern Subject Access in the Online Age: Lesson 2 Pauline A. Cochrane American Libraries Vol. 15, No. 3 (Mar., 1984), pp. 145-148, 150 Stable URL: http://www.jstor.org/stable/25626647

*Modern Subject Access in the Online Age: Lesson 3
Author(s): Pauline A. Cochrane, Marcia J. Bates, Margaret Beckman, Hans H. Wellisch, Sanford Berman, Toni Petersen, Stephen E. Wiberley and Jr.
Source: American Libraries, Vol. 15, No. 4 (Apr., 1984), pp. 250-252, 254-255
Stable URL: http://www.jstor.org/stable/25626708

*Modern Subject Access in the Online Age: Lesson 4
Author(s): Pauline A. Cochrane, Carol Mandel, William Mischo, Shirley Harper, Michael Buckland, Mary K. D. Pietris, Lucia J. Rather and Fred E. Croxton
Source: American Libraries, Vol. 15, No. 5 (May, 1984), pp. 336-339
Stable URL: http://www.jstor.org/stable/25626747

*Modern Subject Access in the Online Age: Lesson 5
Author(s): Pauline A. Cochrane, Charles Bourne, Tamas Doczkocs, Jeffrey C. Griffith, F. Wilfrid Lancaster, William R. Nugent and Barbara M. Preschel
Source: American Libraries, Vol. 15, No. 6 (Jun., 1984), pp. 438-441, 443
Stable URL: http://www.jstor.org/stable/25629231

*Modern Subject Access In the Online Age: Lesson 6
Author(s): Pauline A. Cochrane, Brian Aveney and Charles Hildreth Source: American Libraries, Vol. 15, No. 7 (Jul. - Aug., 1984), pp. 527-529
Stable URL: http://www.jstor.org/stable/25629275

4 comments:

Ted Fons said...

Fascinating. That's a good find Karen. I'm developing a theory that we focused so heavily on building up the transactional side of library systems that we didn't focus on innovation in service to the user. With the advent of powerful mainframes and investment in higher education we got lucky and had the computing power to develop highly efficient back end systems for libraries. I suspect the focus on highly efficient data exchange and circulation was the path of less resistance than innovations in the convenience of the catalog searcher.

Or maybe our investment in backend and front end systems was equal, but we just lacked the disruptive ideas before we were inspired by the search engines in the 90s. Either way, I agree that the convenience of the searcher has not been the highest priority or we have just failed to achieve it. This is nice evidence that we are aware of it, but still struggling to achieve it.

Marylin Johnson Raisch said...

I think you mean, in your reading list, that you are reading the 2001 title: The Nature of 'A Work': Implications for the Organization of Knowledge by R. Smiraglia. Thanks. He was my library school professor back when Columbia U had a school. Great teacher!

Karen Coyle said...

Ted, I think you hit the nail on the head. My feeling is that transaction technology, including keyword retrieval, is the simplest and easiest thing we could do, and so we did it. Not only is it pure computing, without the need for user studies or ontological thinking, but it can be done without any change in cataloging. In a real sense, that was our charge at the time: take the MARC records "as is" and make a catalog out of them.

A key issue here is that technology split off from cataloging, so the technology of the catalog and the act of preparing the catalog through cataloging became separate functions. I'm still pondering how this happened; I lived through it and still I can't fathom how we missed this obvious flaw in our plans. I do recall that at the time we had conversations with cataloging departments, but at no time did anyone question the very basis of the catalog data as compare to the functionality of the online catalog. As I said in Part I, some catalogers criticized the fact that headings were no longer the organizing factors of the catalog, but I don't recall anyone questioning the validity of the continued creation of headings. Thinking back on it, it feels like one of those Twilight Zone episodes where there are alternate universes that don't mesh.

Adam Siegel said...

Great series. As for the disjunct between technology and technical services, library disinvestment in technical services staffing (downsizing, deskilling, and outsourcing the cataloging function) over the course of many decades is the main culprit.