Coyle's InFormation: ER models

Showing posts with label ER models. Show all posts

Sunday, January 17, 2016

Sub-types in FRBR

One of the issues that plagues FRBR is the rigidity of the definitions of work, expression, and manifestation, and the "one size fits all" nature of these categories. We've seen comments (see from p. 22) from folks in the non-book community that the definitions of these entities is overly "bookish" and that some non-book materials may need a different definition of some of them. One solution to this problem would be to move from the entity-relation model, which does tend to be strict and inflexible, to an object-oriented model. In an object-oriented (OO) model one creates general types with more specific subtypes that allows the model both to extend as needed and to accommodate specifics that apply to only some members of the overall type or class. Subtypes inherit the characteristics of the super-type, whereas there is no possibility of inheritance in the E-R model. By allowing inheritance, you avoid both redundancy in your data but also the rigidity of E-R and the relational model that it supports.

This may sound radical, but the fact is the FRBR does define some subtypes. They don't appear in the three high-level diagrams, so it isn't surprising that many people aren't aware of them. They are present, however in the attributes. Here is the list of attributes for FRBR work:

title of the work
form of work
date of the work
other distinguishing characteristic
intended termination
intended audience
context for the work
medium of performance (musical work)
numeric designation (musical work)
key (musical work)
coordinates (cartographic work)
equinox (cartographic work)

I've placed in italics those that are subtypes of work. There are two: musical work, and cartographic work. I would also suggest that "intended termination" could be considered a subtype of "continuing resource", but this is subtle and possibly debatable.

Other subtypes in FRBR are:

Expression: serial, musical notation, recorded sound, cartographic object, remote sensing image, graphic or projected image
Manifestation: printed book, hand-printed book, serial, sound recording, image, microform, visual projection, electronic resource, remote access electronic resource

These are the subtypes that are present in FRBR today, but because sub-typing probably was not fully explored, there are likely to be others.

Object-oriented design was a response to the need to be able to extend a data model without breaking what is there. Adding a subtype should not interfere with the top-level type nor with other subtypes. It's a tricky act of design, but when executed well it allows you satisfy the special needs that arise in the community while maintaining compatibility of the data.

Since we seem to respond well to pictures, let me provide this idea in pictures, keeping in mind that these are simple examples just to get the idea across.

The above picture models what is in FRBR today, although using the inheritance capability of OO rather than the E-R model where inheritance is not possible. Both musical work and cartographic work have all of the attributes of work, plus their own special attributes.

If it becomes necessary to add other attributes that are specific to a single type, then another sub-type is added. This new subtype does not interfere with any code that is making use of the elements of the super-type "work". It also does not alter what the music and maps librarians must be concerned with, since they are in their own "boxes." As an example, the audio-visual community did an analysis of BIBFRAME and concluded, among other things, that the placement of duration, sound content and color content in the BIBFRAME Instance entity would not serve their needs; instead, they need those elements at the work level.*

This just shows work, and I don't know how/if it could or should be applied to the entire WEMI thread. It's possible that an analysis of this nature would lead to a different view of the bibliographic entities. However, using types and sub-types, or classes and sub-classes (which would be the common solution in RDF) would be far superior to the E-R model of FRBR. If you've read my writings on FRBR you may know that I consider FRBR to be locked into an out-of-date technology, one that was already on the wane by 1990. Object-oriented modeling, which has long replaced E-R modeling, is now being eclipsed by RDF, but there would be no harm in making the step to OO, at least in our thinking, so that we can break out of what I think is a model so rigid that it is doomed to fail.

*This is an over-simplification of what the A-V community suggested, modified for my purposes here. However, what they do suggest would be served by a more flexible inheritance model than the model currently used in BIBFRAME.

Tuesday, January 12, 2016

Floor wax, or dessert topping?

As promised, my book, FRBR: Before and After, is now available in PDF as open access with a CC-BY license. I'd like to set up some way that we can turn this into a discussion, so if you have a favorite hanging out place, make a suggestion.

Also, the talk I gave at SWIB2015 is now viewable on youtube: Mistakes Have Been Made. That is a much shorter (30 minutes) and less subdued explanation of what I see as the problems with FRBR. If that grabs you, some chapters of the book will give you more detail, and the bibliography should keep anyone busy for a good long time.

Let me be clear that I am not criticizing FRBR as a conceptual model of the bibliographic universe. If this view helps catalogers clarify their approach to cataloging problems, then I'm all for it. If it helps delineate areas that the cataloging rules must cover, then I'm all for it. What I object to is that implication that this mental model = a data format. Oddly, both the original FRBR document and the recent IFLA model bringing all of the FR's together, are very ambiguous on this. I've been told, in no uncertain terms, by one of the authors of the latter document that it is not data model, it's a conceptual model. Yet the document itself says:

The intention is to produce a model definition document that presents the model concisely and clearly, principally with formatted tables and diagrams, so that the definitions can be readily transferred to the IFLA FRBR namespace for use with linked open data applications.

And we have the statement by Barbara Tillett (one of the developers of FRBR) that FRBR is a conceptual model only:

"FRBR is not a data model. FRBR is not a metadata scheme. FRBR is not a system design structure. It is a conceptual model of the bibliographic universe." Barbara Tillett. FRBR and Cataloging for theFuture. 2005

This feels like a variation on the old Saturday Night Live routine: "It's a floor wax. No! It's a dessert topping!" The joke being that it cannot be both. And that's how I feel about FRBR -- it's either a conceptual model, or a data model. And if it's a data model, it's an entity-relation model suitable for, say, relational databases. Or, as David C. Hay says in his 2006 book "Data Model Patterns: A Metadata Map":

Suppose you are one of those old-fashioned people who still models with entity classes and relationships.

It's not that entities and relations are useless, it's just that this particular style of data modeling, and the technology that it feeds into, has been superseded at least twice since the FRBR task group was formed: by object-oriented design, and by semantic web design. If FRBR is a conceptual model, this doesn't matter. If it's a data model -- if it is intended to be made actionable in some 21st century technology -- then a whole new analysis is going to be needed. Step one, though, is getting clear which it is: floor wax, or dessert topping.

Wednesday, February 19, 2014

FRBR goals: entities, relations, and a core level record

The FRBR study was motivated by a 1990 international seminar on cataloging held in Stockholm. The charge to the study group was approved by the IFLA Standing Committee of the Section of Cataloguing in 1992. That document, called the Terms of Reference for a Study of the Functional Requirements for Bibliographic Records, stated:

Today the expectations and constraints facing bibliographic control are more pressing than ever. All libraries, including national bibliographic agencies, are operating under increasing budgetary constraints and increasing pressures to reduce cataloging costs through minimal level cataloging. [1]

Or, as Olivia Madison, the chair of the FRBR Study Group from 1991-1993 and 1995-1997, put it:

The Stockholm Seminar addressed the general question: "Can cataloging be considerably simplified?" [2]

The Standing Committee decided that consultants with particular skills in the area of cataloging were needed in order to approach the task, and three (later four) consultants were engaged. The primary charge to the consultants was:

1. Determine the full range of functions of the bibliographic record and then state the primary uses of the record as a whole.

This is at the very least a daunting task. However, the Terms of Reference gave the consultants some guidance about how to go about their work. The remaining tasks for the consultants were:

2. Develop a framework that identifies and clearly defines the full range of entities (e.g., work, texts, subjects, editions and authors) that are the subject of interest to users of a bibliographic record and the types of relationships (e.g. part/whole, derivative, and chronological) that may exist between those entities.
3. For each of the entities in the framework, identify and define the functions (e.g., to describe, to identify, to differentiate, to relate) that the bibliographic record is expected to perform.
4. Identify the key attributes (e.g., title, date, and size) of each entity or relationship that are required for each specific function to be performed. Attribute requirements should relate specifically to the media or format of the bibliographic item where applicable.

The notions of entities, relationships, and attributes don't appear in traditional cataloging theory; they come instead from the world of database design, and in particular relational database design. Because these concepts were expected to be unfamiliar to members of the committee and perhaps also the consultants, the Terms of Reference provides definitions, using as its source the 1984 book Data Analysis: the Key to Data Base Design, by Richard C. Perkinson. (Note, some of this is re-iterated in the FRBR final report, in the section on methodology, where four books are cited as sources of information on entity-relation methodology.)

Those were the tasks for the consultants, the selected experts who would do the analysis and present the report to the Study Group. The Study Group itself had this task:

5. For the National Libraries: for bibliographic records created by the national bibliographic agencies, recommend a basic level of functionality that relates specifically to the entities identified in the framework the functions that are relevant to each.

It appears to be this last charge that directly addressed the needs expressed in the Stockholm seminar: the need for a core level record that would help cataloging agencies reduce their costs while still serving users. I read the charges to the consultants as mainly providing a working methodology that would allow the consultants to focus their energies on what amounts to a general rethinking of cataloging theory and practice.

The Terms of Reference is a rather bare bones statement of what needs to be done, and it says little about the why of the study. According to Tillett's 1994 report [3], some of the concerns that came out of Stockholm were:

"the mounting costs of cataloging," the proliferation of new media, "exploding bibliographic universe," the need to economize in cataloging, and "the continuing pressures to adapt cataloguing practices and codes to the machine environment."

The FRBR document states the motivation as:

"The purpose of formulating recommendations for a basic level national bibliographic record was to address the need identified at the Stockholm Seminar for a core level standard that would allow national bibliographic agencies to reduce their cataloguing costs through the creation, as necessary, of less-than-full-level records, but at the same time ensure that all records produced by national bibliographic agencies met essential user needs." [4] p.2

At this point, it is worth asking: did the FRBR study indeed result in a "core level standard" that would reduce cataloging costs for national bibliographic agencies? It definitely did define a core level standard, although that aspect of the FRBR report is not often discussed. Chapter 7 of the FRBR document, BASIC REQUIREMENTS FOR NATIONAL BIBLIOGRAPHIC RECORDS, lists the "basic level of functionality" for library catalogs:

Find all manifestations embodying:

the works for which a given person or corporate body is responsible

the various expressions of a given work

works on a given subject

works in a given series

Find a particular manifestation:

when the name(s) of the person(s) and/or corporate body(ies) responsible for the work(s) embodied in the manifestation is (are) known

when the title of the manifestation is known

when the manifestation identifier is known

Identify a work
Identify an expression of a work
Identify a manifestation
Select a work
Select an expression
Select a manifestation
Obtain a manifestation

This of course looks quite similar to the goals of a catalog developed over a century ago by Charles Ammi Cutter:

Section 7.3 of the chapter lists the descriptive and organizing elements (headings) that should make up a core bibliographic record. This chapter should be a key element of the FRBR study results, yet it isn't often mentioned in discussions of FRBR, which tend to focus on the ten (or eleven, if you add family) entities and their primary relationships to each other (is realization of, manifests, etc.), and the four user tasks (find, identify, select, obtain).

While most people can hold forth on the FRBR entities, few can discourse on this outcome of the report, which is a basic level national bibliographic record. Admittedly, the report itself does not emphasize this information. The elements of the basic level record use the terminology of ISBD, not of FRBR, which makes it difficult to see the direct connection with the rest of the report. (I haven't had the fortitude to work through the appendix comparing FRBR attributes with ISBD, GARE and GSARE but I assume that a matching was done. However, this does make the recommended core record hard to read in the context of FRBR.) For example, there are core descriptive elements relating to uniform titles ("addition to uniform title - numeric designation (music)") yet uniform titles are not mentioned among the FRBR attributes and the term "uniform title" is not included in the index.

It is not clear to me what has happened to the goal to provide a solution for cash-strapped cataloging agencies. The E-R model, which in my reading was offered as a methodology to support the analysis that needed to be done, has become what people think of as FRBR. If the FRBR Review Group, which is currently maintaining the results of the Study Group's work, does have activities that are aimed at helping national libraries do their work more effectively while saving them cataloging time, it isn't nearly as visible as the work being done to create definition of bibliographic data that follows entity-relation modeling. In any case, I, for one, was actually surprised to discover Chapter 7 in my copy of the FRBR Study Group report.

[1] Terms of Reference for a Study of the Functional Requirements for Bibliographic Records. (1992). Available in: Le Boeuf, P. (2005). Functional Requirements for Bibliographic Records (FRBR): Hype or Cure-All?. New York: Haworth Information Press.
[2] Madison, Olivia M.A. The origins of the IFLA study on Functional Requirements for Bibliographic Records. In: LE BŒUF, Patrick. Ed. Functional Requirements for Bibliographic Records (FRBR): Hype, or Cure-All? [printed text]. Binghamton, NY: the Haworth Press, 2005.

[3]Tillett, B. B. (1994). IFLA Study on the Functional Requirements of Bibliographic Records : Theoretical and Practical Foundations, (April), 1–5.

[4] IFLA Study Group on the Functional Requirements for Bibliographic Records. (2009). Functional Requirements for Bibliographic Records. Retrieved from http://archive.ifla.org/VII/s13/frbr/frbr_2008.pdf

Friday, February 14, 2014

FRBR as a conceptual model

(I have been working on a very long and very detailed analysis of FRBR, probably more than anyone wants to know. But some parts of that analysis might be generally helpful in understanding FRBR, so I'm going to "leak" those ideas out through this blog.)

The FRBR document, in its section on Methodology, gives the reasoning behind the use of entity-relation modeling technique:

The methodology used in this study is based on an entity analysis technique that is used in the development of conceptual models for relational database systems. Although the study is not intended to serve directly as a basis for the design of bibliographic databases, the technique was chosen as the basis for the methodology because it provides a structured approach to the analysis of data requirements that facilitates the processes of definition and delineation that were set out in the terms of reference for the study.

E-R modeling is a multi-step technique that begins with a high-level conceptual analysis of the data universe that is being considered. Quoting the FRBR document again:

The first step in the entity analysis technique is to isolate the key objects that are of interest to users of information in a particular domain. These objects of interest or entities are defined at as high a level as possible. That is to say that the analysis first focuses attention not on individual data but on the "things" the data describe. Each of the entities defined for the model, therefore, serves as the focal point for a cluster of data. An entity diagram for a personnel information system, for example, would likely identify "employee" as one entity that would be of interest to the users of such a system.

This is a very good description of conceptual modeling. So it is either puzzling or disturbing that most readings of FRBR do not recognize this difference between a conceptual model and either a record format or a logical model. In part this is because few have done a close reading of the FRBR document, and unfortunately it is easy to view the diagrams there as statements of data structure rather than high level concepts about bibliographic data. (It's not surprising that people get their information about FRBR from the diagrams, rather than the text. There are three very simple diagrams in the document, and 142 pages of text. Yet even if a picture is worth a thousand words, those three are not equal to the text.)

One of the main assumptions about FRBR is that the entities listed there should be directly translated into records in any bibliographic data design that intends to implement FRBR. For example, there is much criticism of BIBFRAME for presenting a two-entity bibliographic model instead of the four entities of FRBR. This reflects the mistaken idea that each Group 1 entity must be a record in whatever future bibliographic formats are developed. As entities in a conceptual model there is absolutely no direct transfer from conceptual entities to data records. How best to create a record format that carries the concepts is something that would be arrived at after a further and more detailed technical analysis. In fact, the development of a record format might not seem to be a direct descendent of the E-R model, since the E-R modeling technique has a bias toward the structure of relational database management systems, not records, and the FRBR Study Group was not intending its work to be translated directly to a database design.

There are innumerable ways that one could implement a data design that fulfills the conceptual view of FRBR. In E-R modeling there are subsequent steps that build on the conceptual design to develop it into an actionable data design. These steps are actually more detailed and imposing than the conceptual design which is often used to bridge the knowledge gap between operational staff and the technical staff that must creating a working system. The step after the conceptual model is usually the logical design step that completes the list of attributes, and defines the types of data values that will be stored in the database tables (text, date, currency) and the cardinality of each data element (mandatory, optional, repeatable, etc.). It then normalizes the data to remove any duplication of data within the entire database. It also resolves relationships between data tables so that one-to-many and many-to-many relationships are correctly implemented for the applications that will make use of them. Although this is couched in terms of database design, an equally rigorous step would be needed to move from a conceptual view to a design for a format that could be used in library systems and for data exchange.

As an illustration, here is a logical design for the bibliographic system MusicBrainz that stores information about recorded music. It has many of the same concepts as FRBR (works, performers, variant expressions), and must resolve the complex relationships between albums, songs, and performances (not unlike what a music library catalog must do):

With perhaps some difference in details you could say that this implements the concepts of FRBR. Still, this is a database design, and not a record format. For many databases, there is no single record that represents all of the stored data. Business databases are generally a combination of data from numerous departments and processes, and they can often output many different data combinations as needed.

It does say something about the state of technology awareness in the library profession that once a presumably successful conceptual model was developed there was no second step to make that model operational. What was the ultimate goal of FRBR, and did it fulfill that goal? Look for another post soon on that topic.

Wednesday, November 07, 2007

Hierarchy v. Relationships

The use of hierarchy as an organizing principle keeps coming up. I think we are attracted to hierarchy because of its neatness, even though in fact the real world is organized more like fuzzy sets. Fuzzy sets are hard to comprehend, nearly impossible to draw, and can't be slotted neatly into an application.

When people talk about FRBR, they are often focussed on the Group 1 entities, and those are seen as hierarchical. They tend to be shown as:
- Work
-- Expression
--- Manifestation
---- Item

as if we'll fit all of our intellectual works into such a neat hierarchy. T'ain't so. Of all of the relationships that are talked about in FRBR (I almost said "expressed" but that term has now been given a new meaning in this discussion) I think these are the least interesting. And they become even less interesting when we move beyond the traditional inventory control function of the library catalog and begin to see ourselves as navigating in a knowledge universe. But first let me tackle the Group 1 entities.

There are complaints (or remarks, depending on the context) that we don't have an agreed on definition of Work, and that the division between Work and Expression is unclear. They are unclear because in real life there isn't a neat hierarchy that just needs to be modeled. What is a Work is entirely contextual -- when I'm looking for an article, the article is a Work. When I'm subscribing to a journal, the journal is a work. When I'm on iTunes a song is a work, when I'm in the music store the album is a work. A Work is the content I am seeking at that time. In the imaginary universe where I get to create my bibliographic system, a Work will be defined as: anything you wish to talk about, point to, address. So a book-length text is a work, an article in a journal is a work, a journal is a work, a book chapter is a work -- all at the same time and in the same system. For one person the book Wizard of Oz and the movie Wizard of Oz will be a single Work. To a film buff, the director's cut of Blade Runner and the original release are distinct works. To be a Work, it just has to be definable and have a way to name it, that is it has to have an identifier. But anything can be a Work. As a matter of fact, I probably won't use the term Work at all in my universe.

As for Expressions, there will be very obvious Expressions of Works, and there will be fuzzier Expressions. There will be Expressions that express more than one Work. Expression is a relationship, not a subset. If you don't have to organize your bibliographic universe in a hierarchical way, then the need to slot each Expression under a Work goes away, although the relationship can remain.

I'm less sure about Manifestation and Item, even though these are the most concrete of the Group 1 entities. Are they a legitimate focus of a Knowledge Management system, or are they about managing physical objects? When I think about some of the uses of bibliographic data, for instance as citations in a text publication, Manifestation seems to be mainly about locating -- so if I've quoted a passage from a book, I need to cite the manifestation and the page because that's the only way that someone else can find that exact quote. When I include a URL in a document that links to a particular digital manifestation, I am giving the user a direct link to the location. Manifestations and Items will be of interest in some instances, say to rare book collectors, but I'm not at all sure that those instances justify the emphasis they have been given. And if the purpose is primarily inventory control, then I think those relationships will be managed to the extent that they matter to the library. For example, a public library may not terribly care which manifestation of the book Moby Dick is on its shelves, although its inventory system will need to know the barcode, and its acquisitions system will need to store how much the library paid for it and the provider.

The truly interesting relationships in FRBR are those between and among these entities, and those are ones that I have not seen explored. These are the relationships between things: thing1 is a translation of thing2; thing3 is an abridgment of thing4; thing5 extends thing6 in this certain way; thing7 cites thing1; thing8 continues thing3. This is where we get real value, where we provide various interesting paths through which seekers can navigate. This is what we don't provide explicitly in our catalogs today, although a human user may be able to intuit some of these relationships among the works we present.

We have so narrowly defined bibliographic control in libraries that it doesn't really include the relationships between intellectual products, except to the degree that we might make a note that one thing is a translation of another thing. But we see those relationships as "extra" or "secondary," and yet they are the very essence of knowledge creation. It astonishes me that we have focused so completely on the physical items that we have essentially missed what would make our catalogs intelligent.