Wednesday, May 17, 2017

Two FRBRs, Many Relationships

There is tension in the library community between those who favor remaining with the MARC21 standard for bibliographic records, and others who are promoting a small number of RDF-based solutions. This is the perceived conflict, but in fact both camps are looking at the wrong end of the problem - that is, they are looking at the technology solution without having identified the underlying requirements that a solution must address. I contend that the key element that must be taken into account is the role of FRBR on cataloging and catalogs.

Some background:  FRBR is stated to be a mental model of the bibliographic universe, although it also has inherent in it an adherence to a particular technology: entity-relation analysis for relational database design. This is stated fairly clearly in the  introduction to the FRBR report, which says:

The methodology used in this study is based on an entity analysis technique that is used in the development of conceptual models for relational database systems. Although the study is not intended to serve directly as a basis for the design of bibliographic databases, the technique was chosen as the basis for the methodology because it provides a structured approach to the analysis of data requirements that facilitates the processes of definition and delineation that were set out in the terms of reference for the study. 

The use of an entity-relation model was what led to the now ubiquitous diagrams that show separate entities for works, expressions, manifestations and items. This is often read as a proposed structure for bibliographic data, where a single work description is linked to multiple expression descriptions, each of which in turn link to one or more manifestation descriptions. Other entities like the primary creator link to the appropriate bibliographic entity rather than to a bibliographic description as a whole. In relational database terms, this would create an efficiency in which each work is described only once regardless of the number of expressions or manifestations in the database rather than having information about the work in multiple bibliographic descriptions. This is seen by some as a potential efficiency also for the cataloging workflow as information about a work does not need to be recreated in the description of each manifestation of the work.

Two FRBRs


What this means is that we have (at least) two FRBR's: the mental model of the bibliographic universe, which I'll refer to as FRBR-MM; and the bibliographic data model based on an entity-relation structure, which I'll refer to as FRBR-DM. These are not clearly separated in the FRBR final report and there is some ambiguity in statements from members of the FRBR working group about whether both models are intended outcomes of the report. Confusion arises in many discussions of FRBR when we do not distinguish which of these functions is being addressed.

FRBR-Mental Model


FRBR-MM is the thinking behind the RDA cataloging rules, and the conceptual entities define the structure of the RDA documentation and workflow. It instructs catalogers to analyze each item they catalog as being an item or manifestation that carries the expression of a creative work. There is no specific data model associated with the RDA rules, which is why it is possible to use the mental model to produce cataloging that is entered into the form provided by the MARC21 record; a structure that approximates the catalog entry described in AACR2.

In FRBR-MM, some entities can be implicit rather than explicit. FRBR-MM does not require that a cataloguer produce a separate and visible work entity. In the RDA cataloging coded in MARC, the primary creator and the subjects are associated with the overall bibliographic description without there being a separate work identity. Even when there is a work title created, the creator and subjects are directly associated with the bibliographic description of the manifestation or item. This doesn't mean that the cataloguer has not thought about the work and the expression in their bibliographic analysis, but the rules do not require those to be called out separately in the description. In the mental model you can view FRBR as providing a checklist of key aspects of the bibliographic description that must be addressed.

The FRBR report defines bibliographic relationships more strongly than previous cataloging rules. For her PhD work, Barbara Tillett (a principal on both the FRBR and RDA work groups) painstakingly viewed thousands of bibliographic records to tease out the types of bibliographic relationships that were noted. Most of these were implicit in free-form cataloguer-supplied notes and in added entries in the catalog records. Previous cataloging rules said little about bibliographic relationships, while RDA, using the work of Tillett which was furthered in the FRBR final report, has five chapters on bibliographic relationships. In the FRBR-MM encoded in MARC21,  these continue to be cataloguer notes ("Adapted from …"), subject headings ("--adaptations"), and added entry fields. These notes and headings are human-readable but do not provide machine-actionable links between bibliographic descriptions. This means that you cannot have a system function that retrieves all of the adaptations of a work, nor are systems likely to provide searches based on relationship type, as these are buried in text. Also, whether relationships are between works or expressions or manifestations is not explicit in the recorded data. In essence, FRBR-MM in MARC21 ignores the separate description of the FRBR-defined Group 1 entities (WEMI), flattening the record into a single bibliographic description that is very similar to that produced with AACR2.

FRBR-Data Model


FRBR-DM adheres to the model of separate identified entities and the relationships between them. These are seen in the diagrams provided in the FRBR report, and in the section on bibliographic relationships from that report. The first thing that needs to be said is that the FRBR report based its model on an analysis that is used for database design. There is no analysis provided for a record design. This is significant because databases and records used for information exchange can have significantly different structures. In a database there could be one work description linked to any number of expressions, but when exchanging information about a single  manifestation presumably the expression and work entities would need to be included. That probably means that if you have more than one manifestation for a work being transmitted, that work information is included for each manifestation, and each bibliographic description is neatly contained in a single package. The FRBR report does not define an actual database design nor a record exchange format, even though the entities and relations in the report could provide a first step in determining those technologies.

FRBR-DM uses the same mental model as FRBR-MM, but adds considerable functionality that comes from the entity-relationship model. FRBR-DM implements the concepts in FRBR in a way that FRBR-MM does not. It defines separate entities for work, expression, manifestation and item, where MARC21 has only a single entity. FRBR-DM also defines relationships that can be created between specific entities. Without actual entities some relationships between the entities may be implicit in the catalog data, but only in a very vague way. A main entry author field in a MARC21 record has no explicit relationship to the work concept inherent in the bibliographic description, but many people's mental model would associate the title and the author as being a kind of statement about the work being described. Added entries may describe related works but they do not link to those works.

The FRBR-DM model was not imposed on the RDA rules, which were intended to be neutral as to the data formats that would carry the bibliographic description. However, RDA was designed to support the FRBR-DM by allowing for individual entity descriptions with their own identifiers and for there to be identified relationships between those entities. FRBR-DM proposes the creation of a work entity that can be shared throughout the bibliographic universe where that work is referenced. The same is true for all of the FRBR entities. Because each entity has an identified existence, it is possible to create relationships between entities; the same relationships that are defined in the FRBR report, and more if desired. FRBR-DM, however, is not supported by the MARC21 model because MARC21 does not have a structure that would permit the creation of separately identified entities for the FRBR entities. FRBR-DM does have an expression as a data model in the RDA Registry. In the registry, RDA is defined as an RDF vocabulary in parallel with the named elements in the RDA rule set, with each element associated with the FRBR entity that defines it in the RDA text. This expression, however, so far has only one experimental system implementation in RIMMF. As far as I know, no libraries are yet using this as a cataloging system.

The replacement proposed by the Library of Congress for the MARC21 record, BIBFRAME, makes use of entities and relations similar to those defined in FRBR, but does not follow FRBR to the letter. The extent to which it was informed by FRBR is unclear but FRBR was in existence when BIBFRAME was developed. Many of the entities defined by FRBR are obvious, however, and would be arrived at by any independent analysis of bibliographic data: persons, corporate bodies, physical descriptions, subjects. How BIBFRAME fits into the FRBR-MM or the FRBR-DM isn't clear to me and I won't attempt to find a place for it in this current analysis. I will say that using an entity-relation model and promoting relationships between those entities is a mainstream approach to data, and would most likely be the model in any modern bibliographic data design.


MARC v RDF? 


The decision we are facing in terms of bibliographic data is often couched in terms of "MARC vs. RDF", however, that is not the actual question that underlies that decision. Instead, the question should be couched as: entities and relations, or not? if you want to share entities like works and persons, and if you want to create actual relationships between bibliographic entities, something other than MARC21 is required. What that "something" is should be an open question, but it will not be a "unit record" like MARC21.

For those who embrace the entity-relation model, the perceived "rush to RDF" is not entirely illogical; RDF is the current technology that supports entity-relation models. RDF is supported by a growing number of open source tools, including database management and indexing. It is a World Wide Web Consortium (W3C) standard, and is quickly becoming a mainstream technology used by communities like banking, medicine, and academic and government data providers. It also has its down sides: there is no obvious support in the current version of RDF for units of data that could be called "records" - RDF only recognizes open graphs; RDF is bad at retaining the order of data elements, something that bibliographic data often relies upon. These "faults" and others are well known to the W3C groups that continue to develop the standard and some are currently under development as additions to the standard.

At the same time, leaping directly to a particular solution is bad form. Data development usually begins with a gathering of use cases and requirements, and technology is developed to meet the gathered requirements. If it is desired to take advantage of some or all of the entity-relation capabilities of FRBR, the decision about the appropriate replacement for MARC21 should be based on a needs analysis. I recall seeing some use cases in the early BIBFRAME work, but I also recall that they seemed inadequate. What needs to be addressed is the extent to which we expect library catalogs to make use of bibliographic relationships, and whether those relationships must be bound to specific entities.

What we could gain by developing use cases would be a shared set of expectations that could be weighed against proposed solutions. Some of the aspects of what catalogers like about MARC may feed into those requirements, as well what we wish for in the design of the future catalog. Once the set of requirements is reasonably complete, we have a set of criteria against which to measure whether the technology development is meeting the needs of everyone involved with library data.

Conclusion: It's the Relationships


The disruptive aspect of FRBR is not primarily that it creates a multi-level bibliographic model between works, expressions, manifestations, and items. The disruption is in the definition of relationships between and among those entities that requires those entities to be separately identified. Even the desire to share separately work and expression descriptions can most likely be done by identifying the pertinent data elements within a unit record. But the bibliographic relationships defined in FRBR and RDA, if they are to be actionable, require a new data structure.

The relationships are included in RDA but are not implemented in RDA in MARC21, basically because they cannot be implemented in a "unit record" data format. The key question is whether those relationships (or others) are intended to be included in future library catalogs. If they are, then a data format other than MARC21 must be developed. That data format may or may not implement FRBR-defined bibliographic relationships; FRBR was a first attempt to redefine a long-standing bibliographic model and should be considered the first, not the last, word in bibliographic relationships.

If we couch the question in terms of bibliographic relationships, not warring data formats, we begin to have a way to go beyond emotional attachments and do a reasoned analysis of our needs.