Wednesday, July 25, 2012

Authorities and entities

In my previous post, I talked about the three database scenarios proposed by the JSC for RDA. These can be considered to be somewhat schematic because, of course, real databases are often modified for purposes of efficiency in searching and display, as well as to facilitate update. But the conceptual structures provided in the JSC document are useful ways to think about our data future.

There is one problem that I see, however, and that is the transition from authority control to entities. Because we have authority records for some of the same things that are entities in the entity-relationship model of FRBR, there seems to be a wide-spread assumption that an authority record is the same as an entity record. In fact, IFLA has developed "authority data" models for names and for subjects that are intended to somehow mix with the FRBR model to create a more complete view of the bibliographic description.

This may be a wholly mis-guided activity, for the reason that authority control and entities (in the entity-relation sense) are not at all the same thing.

The library authority control, and the record that carries the information, has as its sole purpose to provide the preferred heading representing the "thing" being identified (person, corporate body, subject heading). It also provides non-preferred forms of the name or subject that might be ones that a catalog user would include in a query for that thing. The rest of the information contained in the record is solely in support of the process of selection of the appropriate string, including documentation of the resources used by the cataloger in making that decision. In knowledge organization thinking, this would be considered a controlled list of terms.

To understand what an entity is, one might use the WEMI entities as examples. An entity is indeed about some defined "thing," and it contains a description of that thing that fulfills one or more intended uses of the metadata. In the WEMI case, we can cite the four FRBR user tasks of find, identify, select, obtain. So if Work is an entity and contains all of the relevant bibliographic information about that Work, then Person is an entity and should contain all of the relevant information about that person. One such piece of information could be a preferred form of the person's name for display in a particular community's bibliographic data, although I could also make the argument that library applications could continue to make use of brief records that support cataloging and display of controlled text strings if that is the only function that is required. In fact, in the VIAF union database of authority data, the data is treated as a controlled list of terms, not unlike a list of terms for languages or musical instruments.

What would be a Person entity? It could, of course, be as much or as little as you would like, but it would be a description of the Person for your purposes. It is this element of description that I think is key, and we could think of it in terms of the FRBR user tasks:

find - would help users find the Person using any form of the name, but also using other criteria like: 19th century French generals; Swedish mystery writers; translators of the Aeneid.

identify - would give users information to help disambiguate between Persons retrieved. This assumes that there would be some amount of biographical information as well as categorization that let users know who precisely this Person entity represents.

select - this is where this would differ from traditional FRBR which seems to assume that one is already looking for bibliographic materials at this step. I suppose that here one might select between Charles Dodgson and Lewis Carroll, whose biographic information is similar but whose area of activity is entirely different.

obtain - this step would lead one to the library's held works by and/or about that Person, but it could also lead to further information, like web pages, entries in an online database, etc.

If you are wondering what a Person entity might look like, it might look like a mashup between an entry in WorldCat identities and Wikipedia. I suggest a mashup because Identities is limited to data already in bibliographic and authority records and therefore has little in the way of general biographical information. That latter is available, sometimes abundantly, in Wikipedia, and of course a link to that Wikipedia entry would be a logical addition to a library record for a Person entity.

What this thinking leads me to conclude is:

1) the library authority file is a term list, not a set of entities, and therefore is not the Person entity implied in FRBR
2) having person entities in our files could be a great service for our users, and it might be possible to create them to take the place of the simple term lists that our authority records now represent
3) the FRBR user tasks may need to be modified or reinterpreted to be focused less on seeking a particular document and more on seeking a particular person (agent) or subject

6 comments:

jrochkind said...

"has as its sole purpose to provide the preferred heading representing the "thing" being identified "

True, but in order to provide the preferred heading for a 'thing' being identified, you have to establish the list of what those things _are_, right?

After all, the _reason_ we want to establish 'preferred heading' for "things" in traditional cataloging is precisely to bring together all the documents 'attached' to that thing (created by, about, manifestations of).

I do not agree that "the library authority file is a term list, not a set of entities, and therefore is not the Person entity implied in FRBR" and do not think you've provided an argument for that.

I'd say the authority file IS a set of entities, it just doesn't have much to say _about_ those entities _except_ for 'preferred term', 'alternate terms', and just a bit of supplementary info that you rightly point out is "solely in support of the process of selection of the appropriate string"

It is a list of entity instances, for instance Persons. It just doesn't have much to say about them, it doesn't have much descriptive metadata about those entities. I agree that it doesn't have much description, and more description would allow better support of user's tasks. But it's still a set of entities (or entity instances), just one that doesn't have much to say about the instances in it's set beyond that they exist, and have some preferred terms and non-preferred terms.

Karen Coyle said...

Jonathan, what I think you are missing is the fact that even the Person name headings are designed to bring together the documents, not the persons. The library catalog does not intend to provide information about persons except as they relate to documents. That may be exactly what the library catalog intends to do, but I still maintain that the person as entity is not represented, only the person name as a controlled vocabulary. So I disagree with your assessment. And I think that my example of WorldCat Identities is a good one.

Ricar2 said...

Authority, entity or what-its-name data must be the cornestone of the future metadata schema. New library search systems must be designed around them, OPACs must be closer to Wikipedia-like capabiblities and less Google-like. Some libraries seems to be approaching that view. See GND (http://d-nb.info/gnd/2022746-2) oo ULAN at Getty (http://www.getty.edu/vow/ULANFullDisplay?find=&role=&nation=&subjectid=500018666) or this in my homecountry (http://www.larramendi.es/francisco_sanchez/i18n/consulta_aut/registro.cmd?id=3154)

Karen Coyle said...

Ricar2,

Exactly. And my assessment is that linked data provides an even better and more reliable method for authorities than what we use in libraries today. The current library name or subject authority combines two functions in one string: identification and display. When the display changes, you lose the connection to the identity, because changes to display also change the identifier. The URIs that you cite each show an identifier that will reliably remain the same even if display forms change. Within those complex records there is a preferred form of the name (and there could be preferred forms in different languages), yet there is also additional information about the person beyond the preferred name form. To me this illustrates what I have called "library authority records vs. entities" although perhaps that wasn't the ideal way to word it. It is a difference between creating a record solely for the name selection purpose vs. creating a record that is intended to provide a description of the person, much like we provide descriptions of bibliographic items.

Ricar2 said...

The problems with entities for me is jumping down the FRBR model. I mean, everithing is pretty clear with entity person of corporate. The entity "work" and its attibutes is rather well-defined in the model. But I find arty trying to put together the attibutes of expression and manifestation. In my view, FRBR (and RDA with it) enters in a mud puddle from the work entity down, and makes the task of describing and connecting entities absurdly complicate.

Karen Coyle said...

Ricar -

I have SERIOUS problems with the group 1 entities in FRBR. I should modify that by saying that I can support the cataloging concepts that WEMI (Work/Expr/etc.) represent, but I do not think that they should be assumed to be a DATA model. (FRBR refers to itself as a CONCEPTUAL model, not a data model.) Others also have a problem with this, and I started a page for some of these issues on the Futurelib wiki. You'll find lots of links there to articles, discussion, blog posts, although it is far from complete.