Friday, February 14, 2014

FRBR as a conceptual model

(I have been working on a very long and very detailed analysis of FRBR, probably more than anyone wants to know. But some parts of that analysis might be generally helpful in understanding FRBR, so I'm going to "leak" those ideas out through this blog.)

The FRBR document, in its section on Methodology, gives the reasoning behind the use of entity-relation modeling technique:
The methodology used in this study is based on an entity analysis technique that is used in the development of conceptual models for relational database systems. Although the study is not intended to serve directly as a basis for the design of bibliographic databases, the technique was chosen as the basis for the methodology because it provides a structured approach to the analysis of data requirements that facilitates the processes of definition and delineation that were set out in the terms of reference for the study.
E-R modeling is a multi-step technique that begins with a high-level conceptual analysis of the data universe that is being considered. Quoting the FRBR document again:
The first step in the entity analysis technique is to isolate the key objects that are of interest to users of information in a particular domain. These objects of interest or entities are defined at as high a level as possible. That is to say that the analysis first focuses attention not on individual data but on the "things" the data describe. Each of the entities defined for the model, therefore, serves as the focal point for a cluster of data. An entity diagram for a personnel information system, for example, would likely identify "employee" as one entity that would be of interest to the users of such a system.

This is a very good description of conceptual modeling. So it is either puzzling or disturbing that most readings of FRBR do not recognize this difference between a conceptual model and either a record format or a logical model. In part this is because few have done a close reading of the FRBR document, and unfortunately it is easy to view the diagrams there as statements of data structure rather than high level concepts about bibliographic data. (It's not surprising that people get their information about FRBR from the diagrams, rather than the text. There are three very simple diagrams in the document, and 142 pages of text. Yet even if a picture is worth a thousand words, those three are not equal to the text.)

One of the main assumptions about FRBR is that the entities listed there should be directly translated into records in any bibliographic data design that intends to implement FRBR. For example, there is much criticism of BIBFRAME for presenting a two-entity bibliographic model instead of the four entities of FRBR. This reflects the mistaken idea that each Group 1 entity must be a record in whatever future bibliographic formats are developed. As entities in a conceptual model there is absolutely no direct transfer from conceptual entities to data records. How best to create a record format that carries the concepts is something that would be arrived at after a further and more detailed technical analysis. In fact, the development of a record format might not seem to be a direct descendent of the E-R model, since the E-R modeling technique has a bias toward the structure of relational database management systems, not records, and the FRBR Study Group was not intending its work to be translated directly to a database design.

There are innumerable ways that one could implement a data design that fulfills the conceptual view of FRBR. In E-R modeling there are subsequent steps that build on the conceptual design to develop it into an actionable data design. These steps are actually more detailed and imposing than the conceptual design which is often used to bridge the knowledge gap between operational staff and the technical staff that must creating a working system. The step after the conceptual model is usually the logical design step that completes the list of attributes, and defines the types of data values that will be stored in the database tables (text, date, currency) and the cardinality of each data element (mandatory, optional, repeatable, etc.). It then normalizes the data to remove any duplication of data within the entire database. It also resolves relationships between data tables so that one-to-many and many-to-many relationships are correctly implemented for the applications that will make use of them. Although this is couched in terms of database design, an equally rigorous step would be needed to move from a conceptual view to a design for a format that could be used in library systems and for data exchange.

As an illustration, here is a logical design for the bibliographic system MusicBrainz that stores information about recorded music. It has many of the same concepts as FRBR (works, performers, variant expressions), and must resolve the complex relationships between albums, songs, and performances (not unlike what a music library catalog must do):

With perhaps some difference in details you could say that this implements the concepts of FRBR. Still, this is a database design, and not a record format. For many databases, there is no single record that represents all of the stored data. Business databases are generally a combination of data from numerous departments and processes, and they can often output many different data combinations as needed.

It does say something about the state of technology awareness in the library profession that once a presumably successful conceptual model was developed there was no second step to make that model operational. What was the ultimate goal of FRBR, and did it fulfill that goal? Look for another post soon on that topic.


Unknown said...

In fact it is a very good explanation to the nature of the FRBR study and its very objectives.This means that FRBR model entities do not lend themselves directly to be bibliographic records .But there must be a database design as a necessary step to construct a bib record.
Arabic Union Catalog Center
Riyadh , Saudi Arabia

Karen Coyle said...

Yes, thank you, we do need technical designs that allow us to create and exchange data, and to store and search the data. Beyond that, though, there are so many questions. Do we need a single record format that is used by all libraries? (Is that a reasonable goal?) What are our goals for sharing? Who will manage the shared bibliographic metadata? How will we fund such an endeavor? Where do the library vendors fit in to all of this?

My main question, though, is: can we support a period of experimentation until we come to a new agreement about bibliographic data? I consider this important because I do not think that we can work it all out ahead of time with a traditional standards process, because that will take too long.

Alison said...

Thanks for this Karen. I recently took a mechanics of metadata course (Library Juice Academy) where the instructor walked us through from the conceptual model level on down and it was really difficult for me to stop thinking about "records" when diagramming possible entities etc. I kept leaping forward to what I thought would then be in a record in the database itself! It was a really good lesson learned.

Hibernator said...

Thanks for this Karen. My developing team recently developed an application based on FRBR in order to facilitate the cataloguing. Based on FRBR study, we designed an entity-relationship model and we have been working with it for two years, and everything seems to be fine. We are now trying to adapt to RDA and currently we have a Linked Data prototype based on vocabulary. We would be so happy to share our ideas with everybody. You can check this link if you are interested. Thanks again for you time.