Sunday, October 28, 2007

Bibliographic ER

No, I'm not sending libraries to the emergency room, although there are days when I feel like we're at that point. The ER in the title refers to Entity-Relationship, a way to look at data that emphasizes the general viewpoint that there are things, and those things exist in relation to each other.

In one sense, this is what we have done for over a century with our library data. The bibliographic records that we create have in them many relationships: Person authored Book; Publishing House published Book; Book is in Series; Book has Topics. Those relationships are implicit in our records, but the data isn't formatted in an entity-relationship model. Our records, instead, talk about the relationships but don't make it easy to give the various entities their own existence. So we create a record that contains:

Book title.
Place, publisher, date
Subject A
Subject B

The record represents all of the information about the book, but there is no record that represents all of the information about the author, or all of the information about the publisher, etc. Instead, those "entities" are buried in bibliographic records scattered throughout the file.

An E-R model would give each of these entities an identity on which you could hang information about the entity.

OK, I can't draw worth beans. But basically the idea is that authors, subjects, publishers, topics, all become entries in their own right. This means that you can add information to the author record or the series record, because they have their own place in the design. It also makes it easy to look at your data from many different points of view, while still retaining all of the richness of the relationships. So from the point of view of the person who is the illustrator in the book above, the bibliographic world may look like this:

This type of model is expressed in FRBR, but the E-R aspect of FRBR does not seem to be incorporated into RDA as it stands today. Instead, RDA appears to be aimed at creating the same flat structure that we have in library data today.

If you take a look at the OpenLibrary you will see that books get a page that is about the book, and authors get a separate page that is about the author. This is very simple, but it is also very important. It means that the catalog is no longer just a list of books with authors but can become a rich source of information about authors. You can add bios for authors, link to web sites about the author, launch a discussion group about a favorite author. Because the author is an entity, not just a data element in a record about the book, it becomes a potentially active part of your information system.

In the future, I hope that we can give life to many more entities in the OpenLibrary, and also that we can give them meaningful relationships between each other. This would mean taking a semantic web approach to library data. I don't have a clear picture of where we'll end up, but I'm glad that folks there are interested in experimenting. If you've already thought this through or have ideas in this direction, please step forward. I'd love to hear from you.


Bruce said...

No particular thoughts, except that I like where you're going.

The one complication that I think you've come across and that we've come across in the bibo ontology work is that contributions are more complex often than simple edits/authors/illustrates relations. So we model contributions as separate entities/resources/objects (depending on your frame of reference), so that we can represent more detailed information like order (a kind of proxy for relative role), etc.

But that doesn't preclude also doing the more direct relations.

Lorcan Dempsey said...

Although central to the frbr notion is that 'book' itself is best treated as several different entities: work etc.

The FRBR entities in this case do map reasonably well onto common ways in which people use 'book' in conversation. Except maybe for 'expression' which is a work in progress.

I think Moby Dick is the best book ever (work).

Hi, we have been recommended to buy the Norton Moby Dick with the blue cover (manifestation).

I left my copy of Moby Dick at home (item).

You have probably looked at the Indecs work which has a strong family resemblance to FRBR but comes up with a different model. It is interesting to compare the two approaches against the background of the motivating requirements.

Anonymous said...

I took a look at one of the Open Library's author pages. This is really exciting stuff. But the library community wouldn't have to recreate the wheel. Our MARC authority records could be expanded (actually transformed) to fulfill this role or function as an entity record. And, of course, they wouldn't have to remain in a MARC framework. There are already note fields in MARC authority records that contain some of this information.

Karen Coyle said...

Bruce, I'll look again at bibo, in particular at the "contributions" section. Thanks.

As for the "book" part, well, let's say that my diagram was very crude so it musn't be taken too literally. I actually want to allow any identifiable bit of created content to be an entity. So it can be a book, a work (in the FRBR sense), a chapter, an article, a serial, a translation ... whatever. Because we will have the ability to form relationships, it doesn't matter at what level the content is entered into the database. This is an idea I'm just beginning to play with, and I'll try to articulate it further in another post.

As for authority records, I actually think we have more information about authors in the bibliographic file than in authority records, and my reason for saying that is WorldCat Identities, which mines WorldCat for information about people and gets a wealth of really great stuff, very little of which comes from the authority records (although those are included).

Kent Fitch said...

Hi Karen

the AustLit project was an attempt to model these relationships for the primary purpose of supporting scholarship. The original data modeling documents describe how that process evolved and more formerly here and here.

The Indecs and Harmony projects were great inspirations for this work which still supports a successful service.

Anonymous said...

Some notes:

1. I see for example AACR2 25.1A, note 1, indicating 'wórk', for the purposes of that chapter, so as to include 'collections and compilations catalogued as a unit'.

2. Within a web environment there could be 'visible' and 'invisible' entitities (or objects, files, plugins, markups) to appear as part of meaningful surfaces for the end user -- as texts, illustrations, comments, links, etc. So in this case, the sense of a 'work' would be extended or at least made sensible for these possibilities of entity articulation.

3. An entity-relationship appraisal may also be amenable to multi-level symbolic articulations, transpositions, appearances, so that the traditional 'access points' or 'descriptive elements' would become symbolic keys in non-isotopic sceneries (that is, not simply keywords from a system but entries with their own history and cognitive potential as such, pointing to diverse kinds of records of data)

4. We suppose there will be a longstanding need for 'epistemological anchoring' so as to build multiple 'data profiles' enabling users and developers to specify when and how a record/database/work was captured, modified, transposed, updated, linked, etc. But not only that : symbolic features would be object of attention, whether recorded/ressignified as user logs or system knowledge bases, leading to non-anticipated relationships.

For example: suppose I would like to know in which books Ava Gardner was studied as an actress, or as being some movie character, or as a comprehensive biographic entity. Her name, originally recorded as a plain 'entity:name-person' will lose these rich possibilities to be rediscovered by users ; after all, former imported records from other libraries probably did not build this sophisticated feature but... an Ava Gardner expert or fan can simply add up these "relationships", then enriching the oiginal record with upper-symbolic layers...

Note that we are not talking about simple 'annotations'; we are thinking of effective ways to 'let things go' about entity-relationship potentials so as to be (re)discovered and 'institutionalized' after some beta offer throughout the whole project community, so as to prove itself desirable/sustainable and eventually enhancing the very concept of virtual open library...

This implies on asking ourselves: in which ways people can enhance entity-relationship established frameworks so as to bring them to the context of what could be happening at their desktops?

Best wishes!