Sunday, November 26, 2006

Authorities and Authors

I was reading through some chapters of Joanna Russ's book "How to Suppress Women's Writing," when I had some ideas about authority control. Russ's book is one that I re-read often. It speaks to more than just women's writing -- it is a general description of how the accomplishments of a non-dominant group in any society can be ignored or devalued. Russ mentions many women authors who never make the "top 100" list, or the "Anthology of [whatever] Literature." She also states that a majority of writers in the 19th century were women.

I immediately thought how it would be interesting to use a large database, such as LoC's file, or WorldCat, to retrieve authors either by gender or country of origin. It then occurred to me that this is information that we do not include in authority records, even though it is probably available in a majority of cases. I also recall -- although I cannot place -- a discussion about adding to the authority record for an author the names of all of the author's works. In this sense, the authority record would be more than a controlled form of the author's name, it would actually contain information about the author that would be of interest to catalog users. There is talk of adding links from author names in catalog records to their Wikipedia entries. Those entries are surely more of interest to users than the authority record, which is just a list of variant forms of the name. For example, look at the wikipedia entry for Joanna Russ, and compare that to the authority record for the same person.

So imagine an "author" record, either related to or in place of the authority record. It could help users understand who the author is, and to place the author in a historical period (even if the authoritative form of the name doesn't include dates). If coded well, a database of author records could provide some interesting information for various areas of study.

Wednesday, November 15, 2006

Cataloging v. Metadata

The joke goes:
Metadata is cataloging done by men.
The point of the joke (yes, I know it isn't funny if you have to explain a joke) is that the term metadata was coined to make cataloging palatable to the computer community, and that they're really the same thing. We often use the terms interchangeably, but my recent forays into the world of RDA development have led me to look more closely at the meaning of the term cataloging, and I have concluded that metadata creation and cataloging are very different activities. I do consider a catalog record to be a form of metadata, but not all metadata, and not even all bibliographic metadata, is cataloging.

It's not just a matter of having rules or not having rules, however. Although it seems obvious to say this, the goal of cataloging is the creation of a catalog. Catalogers create entries for the catalog using rules that are designed to produce not only a certain coherent body of data, but to enforce a particular set of functions (access via alphabetically displayed headings is an example of a function) that are required to support the catalog. The catalog entries create the catalog.

Library cataloging, as a form of metadata, has traditionally had well-defined goals. The catalog records were defined in terms of the catalog's physical structure and functions. With the card-based cataloging rules, from the ALA rules through AACR2, each catalog entry was a precise, unchangeable unit of the catalog, a cell in a well-designed body. As the cataloger created the record, she could know exactly how headings would be used, exactly where they would be filed in the catalog, exactly what actions users would have to take to navigate to them.

This precision ended with the creation of the online catalog. Catalog entries that had been created for a linear card file were being accessed through keyword searches; displays no longer followed the "main entry" concept; the purposeful unit created by the cards in the catalog was destroyed. There was no longer a match between the catalog and the cataloging.

Which is why many of us are having a hard time with the development of our future cataloging rules, RDA. RDA doesn't define the catalog that it is creating catalog entries for, which brings into question how decisions are being made. What is the concept of a catalog in this highly mutable world of ours? I can't imagine that we can create cataloging rules until we define the catalog (or catalogs) that the rules pertain to. It may not be the highly structured system that we had with the card catalog, where access points were THE access points and every card had its one place in the ordered universe that was the catalog. Still, we need to define the catalog before we can expect to create the entries that go into it. We can start with the FRBR "find, identify, select, obtain," but we must close the enormous gap between those functions and actual catalog entries.

Sunday, November 05, 2006

FRBRoo (Object-Oriented)

The folks working on the Conceptual Reference Model for museums (CIDOC) have produced an analysis of FRBR as object-oriented, to match their own OO model. The FRBR final report states that FRBR is based on a relational model, but I have always though that its hierarchical nature (at least that of Group 1 entities) lends itself better to an OO form. (I portrayed it as OO in my 2004 Library Hi-Tech article.) This allows inheritance from the Work to the Expression, etc.

An OO model forces a certain rigor on the data, and in doing their analysis the CIDOC folks found that they needed to redefine some elements of FRBR, in particular the definition of the Work.
FRBRER was flawed with some logical inconsistencies, in particular with regard to its “Group 1 of entities,” those entities that account for the content of a catalogue record. (p.9)
Their problem with the FRBRer (entity-relationship) model's definition of Work seems to be the same as has been bothering me recently, especially in the reading I have done on the difficulties of applying the FRBRer model to serials.
The Work entity such as defined in FRBRER seemed to cover various realities with distinct properties. While the main interpretation intended by the originators of FRBRER seems to have been that of a set of concepts regarded as commonly shared by a number of individual sets of signs (or “Expressions”), other interpretations were possible as well: that of the set of concepts expressed in one particular set of signs, independently of the materialisation of that set of signs; and that of the overall abstract content of a given publication. FRBROO retains the vague notion of “Work” as a superclass for the various possible ways of interpreting the FRBRER definitions: F46 Individual Work corresponds to the concepts associated to one complete set of signs (i.e., one individual instance of F20 Self-Contained Expression); F43 Publication Work comprises publishers’ intellectual contribution to a given publication; and F21 Complex Work is closer to what seems to have been the main interpretation intended in FRBRER. Additionally, a further subclass is declared for F1 Work: F48 Container Work, which provides a framework for conceptualising works that consist in gathering sets of signs, or fragments of sets of signs, of various origins (“aggregates”). (p. 9-10)

If I may paraphrase, the FRBRer Work includes both individual works of creative effort as well as publisher's containers for groups of works. The CIDOC solution is perhaps more complex than I had imagined, but the distinction between the creative output and what publishers (and editors) have chosen to place together in the same container is an important one for our user service goal, especially in the areas of journal publishing and music publishing. Their "container" is what I've been calling the "package" in discussions on the next gen catalog list.

I will think about how this analysis might help us design a bibliographic record. The diagram on page 10 of this report implies that there are two forks to the description: the author's context and the publisher's context. It seems that today's cataloging rules (and perhaps RDA as well) conflate those two, and that when those contexts differ the rules emphasize the publisher's.
In other words, descriptive cataloging is describing the published Work, not the author's Work. If we see those as separate, would our catalog look more FRBR-like?

Thursday, November 02, 2006

Relators

Much of the buzz related to FRBR is its emphasis on relationships. There are the relationships between works, between works and expressions, etc. down the FRBR model through manifestations and items. These are the Group 1 entities in FRBR. Less is said about the Group 2 entities and their "Responsibility" relationships ("is created by Person" "is realized by Corporate Body"). These look a lot like the RDF "triples" that many developers are fond of as semantic organizing principles for data. This is also very similar to the relator codes that we have in the cataloging rules and in MARC21: Smith, Jane, ed.

I have often been frustrated that searches in library catalogs do not allow me to include (or exclude) roles, such as "editor" or "translator." I am annoyed when a search on "Nabokov, Vladimir Vladimirovich, 1899-1977" in the so-called "author" field brings up numerous editions of Lewis Carroll's Alice in Wonderland, translated into Russian by Mr. Nabokov. Yet when I look at the detailed records, in most he his listed simply as an added entry by his full name, but with no relator code. In essence, the catalog has no way to distinguish between works he wrote (or co-wrote, thus the use of the added entry) and those he translated. Unfortunately, it is clear when you look at records in library catalogs that those role codes have not been assigned consistently.

Thom Hickey did a study of relator codes and relator names ($4 v. $e), which he reported in his blog, and came up with the figures below. His interest was in the interaction between the code and the name. Since his study was done in the OCLC WorldCAt catalog, I think it points out that these key roles are not being coded in our records, which essentially results in a lot of false hits for our users. If we can't get these simple relationships coded into our data today, what hope do we have for a relationship-oriented bibliographic view in the future?

Thom's list of top terms:

Relator codes:

prf (1,080,900)
cnd (203,921)
voc (78,921)
itr (77,058)
aut (72,700)
act (56,518)
arr (50,621)
edt (49,205)
trl (43,608)
ill (42,657)

and the top relator terms:

ed (629,083)
joint author (474,307)
ill (214,764)
tr (172,801)
comp (123,239)
printer (60,070)
photographer (45,115)
orient (40,064)
illus (38,201)
former owner (34,892)

From this is seems obvious to me that
  1. These codes are being used primarily in certain cataloging sub-cultures (music and archival works are my best guess)
  2. Some are obviously under-used, in particular "joint author" and "translator"
Here are the added entries for a record for Rainer Werner Fassbinder's film based on Nabokov's novel "Despair":

Bogarde, Dirk, 1921-1999
Ferréol, Andrea
Spengler, Volker
Märthesheimer, Peter, 1938-
Fassbinder, Rainer Werner, 1946-1982
Stoppard, Tom
Nabokov, Vladimir Vladimirovich, 1899-1977. Otchao?a?nie
Löwitsch, Klaus, 1936-

All of these have the same coded relationship to the bibliographic work being described in the MARC21 record: personal added entry.

I very much like the idea of distinguishing roles and making those relationships part of the user experience. But if we aren't taking advantage of the ability we have today, I don't have much hope that we will code these relationships in the future.


Wednesday, November 01, 2006

Cat-a-log(gue)

I was taken to task in the FRBR Blog for saying that FRBR is not about catalogs.
I don’t understand the statement that FRBR isn’t about catalogues. It’s Functional Requirements for Bibliographic Records, and when bibliographic records are shown to users, that’s called a catalogue.
I realized from this comment that I am using the wrong term when I say catalog. A catalog is defined as a list of items, coming from word roots that mean "list, to count up." Examples are a library catalog, a list of items offered for sale, or the catalog of a museum exhibit that lists the items in the exhibit. In the library, the catalog is an inventory of items owned or held by the library. The library catalog is an inventory of items held in the library, and it was once equal to what users could access in the library. From the early or mid-1800's, the physical library catalog also served the purpose of the user interface to the library's holdings. (Prior to that time, catalogs were mainly used by librarians, and not members of the library's public.)

The as-yet-unnamed thing that I wish to define (and which I mistakenly called a catalog) serves these functions (please add on what I have forgotten):
  1. A list of items owned by the library. This list is at a macro level (e.g. serial titles but not the articles in the serial). Often that level is determined by the purchase unit, since this list interacts with the library's acquisitions function.
  2. Serial issues received. This is usually found in a separate module called a serials check-in system (which replaced the old Kardex)
  3. Licenced resources. These may be listed in the catalog, but they may either/also be found in a database used by an OpenURL resolver or in an ERM system (which is not accessible to users). In some cases, these are listed on a web site managed by the library.
  4. Journal article indexes. These used to be hard-copy reference books. They are now often electronic databases. User interface to these varies.
  5. Items available via ILL. This could be a union catalog of libraries in a borrowing unit. It also is a function that interacts with OCLC's ILL system. This latter usually isn't visible in the user view of the library.
  6. Links and connections from information systems not hosted by the library, such as the ability to link from an article in a licensed database to the full text of the article from another source; or a link from an Internet search engine to library-managed resources through a browser plug-in or a web service.
  7. Location and circulation status information, plus the ability for users to place holds on items or to request delivery of items.
  8. Interaction with institutional services such as courseware.
  9. One or more user interfaces. Many of these services above will have a separate interface just for that service, but there are also meta-interfaces that will combine services.
This is a first stab. I'll post this on the futurelib wiki where it can be easily modified.