Monday, August 06, 2018

FRBR as a Data Model


(I've been railing against FRBR since it was first introduced. It still confuses me some. I put out these ideas for discussion. If you disagree, please add your thoughts to this post.)

I was recently speaking at a library conference in OSLO where I went through my criticisms of our cataloging models, and how they are not suited to the problems we need to solve today. I had my usual strong criticisms of FRBR and the IFLA LRM. However, when I finished speaking I was asked why I am so critical of those models, which means that I did not explain myself well. I am going to try again here, as clearly and succinctly as I can.

Conflation of Conceptual Models with Data Models


FRBR's main impact was that it provided a mental model of the bibliographic universe that reflects a conceptual view of the elements of descriptive cataloging. You will find nothing in FRBR that could not be found in standard library cataloging of the 1990's, which is when the FRBR model was developed. What FRBR adds to our understanding of bibliographic information is that it gives names and definitions to key concepts that had been implied but not fully articulated in library catalog data. If it had stopped there we would have had an interesting mental model that allows us to speak more precisely about catalogs and cataloging.

Unfortunately, the use of diagrams that appear to define actual data models and the listing of entities and their attributes have led the library world down the wrong path, that of reading FRBR as the definition of a physical data model. Compounding this, the LRM goes down that path even further by claiming to be a structural model of bibliographic data, which implies that it is the structure for library catalog data. I maintain that the FRBR conceptual model should not be assumed to also be a model for bibliographic data in a machine-readable form. The main reason for this has to do with the functionality that library catalogs currently provide (and and what functions they may provide in the future). This is especially true in relation to what FRBR refers to as its Group 1 entities: work, expression, manifestation, and item.

The model defined in the FRBR document presents an idealized view that does not reflect the functionality of bibliographic data in library catalogs nor likely system design. This is particularly obvious in the struggle to fit the reality of aggregate works into the Group 1 "structure," but it is true even for simple published resources. The remainder of this document attempts to explain the differences between the ideal and the real.

The Catalog vs the Universe


One of the unspoken assumptions in the FRBR document is that it poses its problems and solutions in the context of the larger bibliographic universe, not in terms of a library catalog. The idea of gathering all of the manifestations of an expression and all of the expressions of a work is not shown as contingent on the holdings of any particular library. Similarly, bibliographic relationships are presented as having an existence without addressing how those relationships would be handled when the related works are not available in a data set. This may be due to the fact that the FRBR working group was made up solely of representatives of large research libraries whose individual catalogs cover a significant swath of the bibliographic world. It may also have arisen from the fact that the FRBR working group was formed to address the exchange of data between national libraries, and thus was intended as a universal model. Note that no systems designers were involved in the development of FRBR to address issues that would come up in catalogs of various extents or types.

The questions asked and answered by the working group were therefore not of the nature of "how would this work in a catalog?" and were more of the type "what is nature of bibliographic data?". The latter is a perfectly legitimate question for a study of the nature of bibliographic data, but that study cannot be assumed to answer the first question.

Functionality


Although the F in FRBR stands for "functional" FRBR does little to address the functionality of the library catalog. The user tasks find, identify, select and obtain (and now explore, added in the LRM) are not explained in terms of how the data aids those tasks; the FRBR document only lists which data elements are essential to each task. Computer system design, including the design of data structures, needs to go at least a step further in its definition of functions, which means not only which data elements are relevant, but the specific usage the data element is put to in an actual machine interaction with the user and services. A systems developer has to take into account precisely what needs to be done with the FRBR entities in all of the system functions, from input to search and display.

(Note: I'm going to try to cover this better and to give examples in an upcoming post.)

Analysis that is aimed at creating a bibliographic data format for a library catalog would take into account that providing user-facing information about work and expression is context-dependent based on the holdings of the individual library and on the needs of its users. It would also take into account the optional use of work and expression information in search and display, and possibly give alternate views to support different choices in catalog creation and deployment. Essentially, analysis for a catalog would take system functionality into account.

There a lot of facts about the nature of computer-based catalogs have to be acknowledged: that users are no longer performing “find” in an alphabetical list of headings, but are performing keyword searches; that collocation based on work-ness is not a primary function of catalog displays; that a significant proportion of a bibliographic database consists of items with a single work-expression-manifestation grouping; and finally that there is an inconsistent application of work and expression information in today's data.

In spite of nearly forty years of using library systems whose default search function is a single box in which users are asked to input query terms that will be searched as keywords taken from a combination of creator, title, and subject fields in the bibliographic record, the LRM doubles down on the status of textual headings as primary elements, aka: Nomen. Unfortunately it doesn't address the search function in any reasonable fashion, which is to say it doesn't give an indication of the role of Nomen in the find function. In fact, here is the sum total of what the LRM says about search:

"To facilitate this task [find], the information system seeks to enable effective searching by offering appropriate search elements or functionality."


That's all. As I said in my talk at Oslo, this is up there with the jokes about bad corporate mission statements, like: "We endeavor to enhance future value through intelligent paradigm implementation." First, no information system ineffective searching. Yet the phrase "effective searching" is meaningless in itself; without a definition of what is effective this is just a platitude. The same is true for "appropriate search elements": no one would suggest that a system should use inappropriate search elements, but defining appropriate search is not at all a simple task. In fact, I contend that one of the primary problems with today's library systems is that we specifically lack a definition of appropriate, effective search. This is rendered especially difficult because the data that we enter into our library systems is data that was designed for an entirely different technology: the physical card catalog, organized as headings in alphabetical order.

One Record to Rule Them All


Our actual experience regarding physical structures for bibliographic data should be sufficient proof that there is not one single solution. Although libraries today are consolidating around the MARC21 record format, primarily for economic reasons, there have been numerous physical formats in use that mostly adhere to the international standard of ISBD. In this same way, there can be multiple physical formats that adhere to the conceptual model expressed in the FRBR and LRM documents. We know this is the case by looking at the current bibliographic data, which includes varieties of MARC, ISBD, BIBFRAME, and others. Another option for surfacing information about works in catalogs could follow what OCLC seems to be developing, which is the creation of works through a clustering of single-entity records. In that model, a work is a cluster of expressions, and an expression is a cluster of manifestations. This model has the advantage that it does not require the cataloger to make decisions about work and expression statements before it is known if the resource will be the progenitor of a bibliographic family, or will stand alone. It also does not require the cataloger to have knowledge of the bibliographic universe beyond their own catalog.

The key element of all of these, and countless other, solutions is that they can be faithful to the mental model of FRBR while also being functional and efficient as systems. We should also expect that the systems solutions to this problem space will not stay the same over time, since technology is in constant evolution.

Summary


I have identified here two quite fundamental areas where FRBR's analysis differs from the needs of system development: 1) the difference between conceptual and physical models and 2) the difference between the (theoretical) bibliographic universe and the functional library catalog. Neither of these are a criticism of FRBR as such, but they do serve as warnings about some widely held assumptions in the library world today, which is that of mistaking the FRBR entity model for a data and catalog design model. This is evident in the outcry over the design of the BIBFRAME model which uses a two-tiered bibliographic view and not the three-tiers of FRBR. The irony of that complaint is that at the very same time as those outcries, catalogers are using FRBR concepts (as embodied in RDA) while cataloging into the one-tiered data model of MARC, which includes all of the entities of FRBR in a single data record. While cataloging into MARC records may not be the best version of bibliographic data storage that we could come up with, we must acknowledge that there are many possible technology solutions that could allow the exercise of bibliographic control while making use of the concepts addressed in FRBR/LRM. Those solutions must be based as least as much on user needs in actual catalogs as on bibliographic theory.

As a theory, FRBR posits an ideal bibliographic environment which is not the same as the one that is embodied in any library catalog. The diagrams in the FRBR and LRM documents show the structure of the mental model, but not library catalog data. Because the FRBR document does not address implementation of the model in a catalog, there is no test of how such a model does or does not reflect actual system design. The extrapolation from mental model to physical model is not provided in FRBR or the LRM, as neither addresses system functions and design, not even at a macro level.

I have to wonder if FRBR/LRM shouldn't be considered a model for bibliography rather than library catalogs. Bibliography was once a common art in the world of letters but that has faded greatly over the last half century. Bibliography is not the same as catalog creation, but one could argue that libraries and librarians are the logical successors to the bibliographers of the past, and that a “universal bibliography” created under the auspices of libraries would provide an ideal context for the entries in the library catalog. This could allow users to view the offerings of a single library as a subset of a well-described world of resources, most of which can be accessed in other libraries and archives.
­


3 comments:

Unknown said...

Thank you for this great critical stance. We are currently developing a new "physical" data model at RERO, and we do not know if we should consider the FRBR entities ("work" and especially "expression"). For that, we are carrying on a brief* analysis, based on the FRBR initial objective, which was to develop a framework:
1. That facilitates the exchange of data between libraries
2. That meets the end user's needs
3. That has a minimal description level, in order to reduce cataloguing costs.

So the analysis evaluates basically if it is good to have the expression to perform import/export (point 1), to design the search interface (point 2, mentioned by you as "functional library catalog") and to catalogue documents (point 3).

Preliminary findings show that it is in none of the fore-mentioned cases an advantage to have the expression. Your or any other input are very welcome at this stage!

Nicolas

* This short analysis is not done with a scientific methodology. We do it pragmatically and cannot spend much time on it.

Lukas Koster said...

Your remark about using FRBR for bibliographies rather than for catalogues is interesting. I recently talked to a book historian/researcher, who said that he would love to see the FRBR model in our collection metadata, so he would be able to distinguish between all different versions and editions of specific works. Which also demonstrates your point that a library catalogue does not always present all expressions of a specific work, because the library usually does not have all these expressions.

Karen Coyle said...

Nicolas,

You might want to look at the work of Shoichi Taniguchi, such as:

Taniguchi, Shoichi. “A Conceptual Model Giving Primacy to Text-Level Bibliographic
Entity in Cataloging: A Detailed Discussion,”
Journal of Documentation
58, no.
4 (4) (2003): 363−82.

(There are others in my bibliography: http://kcoyle.net/beforeAndAfter/bibliography978-0-8389-1364-2.pdf)

His analysis is limited to textual works, but he argues that the user is generally seeking at the level of the expression, not the work, in particular due to translations. His argument is pretty convincing.

And I should have pointed to my book on FRBR, which is open access here:

http://kcoyle.net/beforeAndAfter/index.html

Chapters 7 and 8 might be of particular relevance to your work.