Saturday, December 27, 2008

FRBR and Group 2 & 3 Oddities

You've probably realized by now that I cycle back to FRBR frequently, each time discovering something new. New to me, at least. Perhaps because of not being a cataloger it seems that I have missed some key concepts in earlier readings. This might help explain some misunderstandings between me and more catalog-savvy folks.

This time I was thinking about the way that the entities are used with the subject relationship. But before I get to that, there's always the publisher to torment me.

Creators and Publishers in FRBR and RDA

The Group 2 entities have what is called "responsibility relationships" with the Group 1 entities. The diagram (Figure 3.2, p. 14) shows the two Group 2 (G2) entities, person and corporate body, to related to the Group 1 entities in the following way:
Work is created by... G2
Expression is realized by ... G2
Manifestation is produced by ... G2
Item is owned by ... G2
(Note that I find it odd that FRBR limits the Group 1 to Group 2 relationships to only four, and only one per Group 1 entity, but that is how it is written. It makes me wonder what one does with, say, an illustrator of a particular expression of a book. Surely the addition of illustrations doesn't make it a new work?)

In section 4 of FRBR, the Group 2 entities are not included in the lists of attributes of the Group 1 entities. In other words, when you read the list of attributes of a work, there is no mention of creator, and the list of attributes of an item does not include owner.

I was therefore surprised to find among the attributes of a manifestation:
4.4.5 Publisher/Distributor
The publisher/distributor of the manifestation is the individual, group, or organization named in the manifestation as being responsible for the publication, distribution, issuing, or release of the manifestation. A manifestation may be associated with one or more publishers or distributors.
Since Group 2 entities are not listed as attributes in the Group 1 attribute lists, this pretty clearly states that publisher is not a person or corporate body entity.
Yet, the section on relationships between Group 1 and Group 2 entities says:
5.2.2 Relationships to Persons and Corporate Bodies
The entities in the second group (person and corporate body) are linked to the first group by four relationship types: the “created by” relationship that links both person and corporate body to work; the “realized by” relationship that links the same two entities to expression; the “produced by” relationship that links them to manifestation; and the “owned by” relationship that links them to item.
Essentially, this apparent inconsistency between the definitions of the entities and the attribute list for the manifestation has to do with the practice of transcribing data from the manifestation:
At first glance certain of the attributes defined in the model may appear to duplicate objects of interest that have been separately defined in the model as entities and linked to the entity in question through relationships. For example, the manifestation attribute “statement of responsibility” may appear to parallel the entities person and corporate body and the “responsibility” relationships that link those entities with the work and/or expression embodied in the manifestation. However, the attribute defined as “statement of responsibility” pertains directly to the labeling information appearing in the manifestation itself, as distinct from the relationship between the work contained in the manifestation and the person and/or corporate body responsible for the creation or realization of the work. (Section 4.1)
What this points out is that while FRBR supposedly puts forth an entity-relation model, in fact it is no more ER than our current bibliographic model with its mixture of transcribed data, cataloger supplied data, and controlled headings.

Then Comes Group 3

This is easier to explain, because it is very simple: The Group 3 entities (concept, object, event, place) can ONLY be used as subjects, e.g.:
For the purposes of this study places are treated as entities only to the extent that they are the subject of a work (e.g., the subject of a map or atlas, or of a travel guide, etc.). (section 3.2.10)
This eliminates any thought of using place as in "place of publication." Not to mention that each of these has a very limited attribute list; in fact, they each have exactly one attribute:
term for the concept/object/event/place
The Upshot

The upshot is that FRBR does not give us a true entity-relation model for our bibliographic data. This is frustrating for those of us trying to move library data in an ER direction, and it means that to achieve the ER model we will have to go beyond what exists today in FRBR, and beyond the version of FRBR that has been realized in RDA. I've kind of known this, but it's discouraging to have it confirmed in the FRBR document itself. Even more frustrating that it's been there the whole time and I missed it.

I've looked again at FRBR in RDF and the Scholarly Works Application Profile, and both make some interesting extensions to the FRBR concepts, taking them further along the ER road. It seems to me that the DC/RDA work will need also to deviate from FRBR in order to achieve its goals. The big question is: how far can we go and still be compatible with library data?


Erik Hetzner said...

A few reponses:

1. You write: "It makes me wonder what one does with, say, an illustrator of a particular expression of a book."

Doesn't an illustrator help to "realize" an expression?

2. I think that the group 1-group 2 relationships are vague enough to cover many actual relationships, which I presume was the intention. "Realized by", for instance, can include: editor, illustrator, author of a foreword, translator, annotator, etc. Of course in the real world you want to capture this additional information in some way.

But to my mind an advantage of FRBR is that it helps to clarify thinking about library catalogs to collapse all these possible relationships - editor, illustrator, etc. - into one group: "realized by".

3. The "publisher" is an attribute, but it doesn't mean that publisher cannot also be a "produced by" relationship btwn a manifestation and a corporate body.

4. Good point on the place of publication relationship.

5. I'm not sure what you mean when you say that "FRBR does not give us a true entity-relation model for our bibliographic data". It certainly seems that the model is incomplete as an ER model. But it also seems to me that FRBR defines a good starting point, a minimal program for an ER model for bibliographic data, one to be supplemented by local needs.

Anonymous said...

Dumb question: why is E-R the right sort of model for library data?

Karen Coyle said...

egh: 1 & 2: there is little guidance for the difference between "creation" and "realization." I think this is a weakness in the model since it will lead to confusion. 3: using the FRBR model & RDA you create a publisher as an entity only when it is an access point -- that is, only as an added entry, not as part of the publication statement. And, of course, in general one does not make access points for publishers. 5: I'm coming to the conclusion that FRBR does NOT model bibliographic data, just a particular view called "cataloging." As we know with the difference between MARC21 and AACR, there's a lot of data needed that is not included in the cataloging rules. I don't think this is a good thing.

Jodi, the reason for wanting an E-R model is that it will make it more compatible with Web services. If this were 30 years ago or 30 years from now, we'd be desiring a different model. Unfortunately, there is no "universal" model for data of any kind, so my general advice is "mark up in as much detail as economically viable, and work for consistency in content," which make future transformations plausible. FORTUNATELY, we have fairly consistent data in MARC format, which is why I am optimistic about about ability to make a transformation to a new format.

Anonymous said...

Interesting post! These are just my semi-random opinions and interpretations, but...

There is a sentence in FRBR where they note that "the attributes defined for the study were derived from a logical analysis of the data that are typically reflected in bibliographic records," which I read to mean that they may not have picked up on things that are not explicit in the current rules or that might be desirable, but are not in current records. I find that helpful to keep in mind when looking at both FRBR and RDA. There is a tendency for things that are not well-modeled in our current rules to not to be well-modeled in FRBR and RDA. For example, much of video cataloging is not explicit in the rules; it is just general practice and thus subject to all sorts of inconsistencies and local variations. This sort of thing often is not picked up on in these models.

In the orthodox interpretation of FRBR (so far as I understand it), illustrators are considered to "contribute to the realization" of a work. So in RDA if you had a text with supplemental illustrations, the writer would create the work and the illustrator would help realize the expression. If you had a work in which the illustrations were determined to be primary and the text supplemental then you would have the artist who created the work and the author of supplemental text who contributed to the realization of the expression.

For books that are a mixture of text and illustrations or images, AACR2 provides guidance for determining which predominates and that person is considered the main entry/creator. Whether RDA carried this over or not I can't remember (One of my big beefs with RDA is that in a lot of cases they carried over the results of applying AACR2 rules without carrying over the general principles that led to those conclusions. This does not seem to me to lead to more principle-based rules and is not helpful in dealing with new and unusual situations). In AACR2 this is inevitably a binary decision (sensible in a card era) and there is no way to account for the situation in which the text and artwork are so intertwined and interconnected that neither stand alone. I'm not sure what RDA does and all this is off the top of my head without looking anything up so something might be off, but I think the general drift is accurate.

RDA at least seems to be trying to make a rigid correspondence between particular roles and particular FRBR entities. The A/V cataloging community, and in particular those who work with moving images, are not happy with the rigid modeling of one "role" to one Group 1 entity, but that is RDA’s approach. RDA also seems to want to identify "creators" with works and “contributors” with expressions. They thus arrive at the conclusion that people perceived as making more minor contributions (e.g. costume designers) to a moving image are contributing to some expression. This is, if nothing else, inefficient data modeling since all expressions will have the same costume designer. As Greta de Groat once pointed out, it's not like they're going to make another expression of Casablanca with a different costume designer.

In terms of collaborative works, especially things like moving images where the individual contributions can't be as neatly separated as an author’s and illustrator’s contributions might be, the distinction between the creation of the work and the initial realization of the work is very fuzzy and it seems to a lot of moving image catalogers that the two things are unavoidably intertwined. Although someone like the costume designer for most films might be considered to have a more minor role, and as such be a contributor, it seems to me that they are contributing to the creation of the work itself and not to any particular expression.

FRBR does seem to have carried over the traditional practice of just transcribing the publisher's name. I am not sure publisher can't be a type of producer in RDA/FRBR terms so maybe the model can accommodate both views. I do think there is a cost-benefit question here which I can't answer (and probably someone should investigate), but I would say that it would be a non-trivial undertaking to try to control publisher names and there probably would be pushback from those whose focus is minimizing the cost of cataloging.

It can be hard to determine who is the publisher (of course, this applies to transcribing, too). Not everyone seems to follow the same principles when determining publisher. Many videos have an abundance of corporate bodies on their containers and it often is hard to determine which is the publisher/distributor. I was taught to look for logos and names on the spine. I think this is because I was taught by a music cataloger and that is the practice for sound recordings. Some years ago, when I taught a local video cataloging workshop, I was surprised to find out that most of the participants got publisher information from the copyright statement on the packaging.

Publishers merge, break up, change names and so forth and it often can be difficult to determine the date at which some change happened. Of course, these difficulties apply to controlling names for corporate bodies across the board, but if catalogers took on publishers, too, that would increase the workload significantly. This is not necessarily a bad thing if the benefit to users outweighs the cost, but I can see a case for sticking with transcription.

Anonymous said...

I realized I was looking at this all wrong. I think producer is supposed to be a broad term that encompasses publisher, distributor, manufacturer, etc., just as creator encompasses the more specific author, artist, etc. The transcribed publication statement has a relationship to the producer that like that of the transcribed statement of responsibility to creator and contributor.