Coyle's InFormation: 04/01/2014

Friday, April 25, 2014

Works, Expressions, and the Bibliographic Universe

In 1545, Conrad Gessner set about to create a bibliography of everything ever published, the Bibliotheca Universalis. He nearly succeeded. In 2004, Google set about to create a digital universal library by digitizing millions of books from library shelves. Shortly thereafter the Internet Archive began a project, the Open Library, to create "One web page for every book ever written." Meanwhile, OCLC's WorldCat has grown to over 300 million records from 72,000 libraries. All of these exemplify the concept of some universal bibliography or database that make up what I am calling, for now, the bibliographic universe.

The other side of this coin is the catalog of an individual library (or library consortium). This bibliographic data set is expressly designed to be local, not universal. It does not include materials not available to the library user, and what it does include has been selected with that library's constituency in mind.

The FRBR study is a conceptual view of this "bibliographic universe"; that is, an abstract view of intellectual resources and their relations to each other. To say that Magic Mountain is a translation of Der Zauberberg is to make a statement that is unrelated to any actual example of that text, much less the holdings within any individual library. We can conclude that FRBR addresses bibliographic data universally because it claims to be agnostic to any particular usage of the bibliographic concept, and to be able to define any and all bibliographic resources and their relationships.

The problem that arises, as I see it, is where this universal view meets the individual library. Do you express these relationships between items in your library, or do you express the relationships in the bibliographic universe? If your library has Magic Mountain in English and Spanish, but not in German, should you organize your presentation of data around the original German version, even if your users will not be able to read or understand the title in that language? What if you only have the English version - should this be displayed as a manifestation of a translation of the original German text? Is there a purpose to creating bibliographic information about relationships between works, even if the library does not hold those works?

A key question that we need to ask is: what is the purpose of providing these bibliographic relationships? The answer that I believe I would get from catalogers today is that the purpose of this information is to provide an organized view of the library to the user. While in a large library, works and expressions may be organizing principles, in a small library with often only one version of the resource, the addition of information about the work and the expression could be more disorganization than organization, because it doesn't give the user a more organized view of the library's holdings but adds information that may be confusing. This doesn't mean that the information about works and expressions isn't useful - the work and expression could be used to provide interesting information to the user, but in this case they do not provide a useful organizing function in the catalog.

As far as I can tell, neither FRBR nor the cataloging rules (past and present) clearly differentiate between organizing a library, organizing a larger context within which the library operates, or describing the bibliographic universe. I'll accept that organizing the bibliographic universe is probably out of scope in today's world, but where does one draw the line between the individual library and the larger useful bibliographic context that might be useful to your library users?

This is not a new problem. AACR2 introduced the idea of the work by adding the uniform title to the catalog record. The uniform title turned out to be less uniformly used than perhaps was intended because it was not applied by catalogers to all records where it was applicable. In a library with only one (or a small number) of editions of a work, the uniform title was deemed to be either unnecessary or not worth the time of the cataloger. This worked fine in individual libraries but caused problems for sharing and in union catalogs. It also makes it more difficult to move from today's cataloging records to a FRBR-ized catalog, since the essential clues about works are not provided consistently.

FRBR is a conceptual model, but it isn't clear to me what context it is modeling: a library catalog, the conceptual collective of all or most library catalogs, or the bibliographic universe. The original task was to model the essential things of a library bibliographic record to respond to a set of user tasks. However, at the 30,000 foot level at which FRBR operated, the questions about how one would serve users in a particular library is left open. The FRBR user tasks are a look at the existing concept of a library catalog and what one mythical "user" does when approaching it. It also is a look from the point of view of a large library catalog: no one on the study group was from a small or even medium-sized library. FRBR is very much a top-down look at the bibliographic world. If we look at the library bibliographic world from the bottom up -- either looking from the point of view of individual users, or of individual libraries, then we would need to see the FRBR concepts as possibilities, not requirements; possibilities to be used as appropriate for ones particular situation.

We know that a given library serves its particular users, not the "universe of users." The best service for a library's users is to allow the library to make choices that are appropriate for those users. For that reason, requiring libraries to present, in their catalogs, data that has the bibliographic universe as its context is going to be detrimental to library service. At the same time, it would be ideal to have a true catalog of the bibliographic universe available from which a library could draw information or could create links as a way to expand its catalog information for users who need more. For example, the user who looks up the Chicago Manual of Style should be able to learn if the copy the library holds is the latest version. The user looking for "Harry Potter" and seeing that the library has copies in English and Spanish should be able to ask if the book was translated into Vietnamese (yes) or Tegulu (no).

It would be naive to say that we have no use for a bibliotheca universalis. However, a bibliotheca universalis is not a library catalog. It would also be naive to say that every library has the same needs regarding bibliographic data. What we seem to be lacking is the way to bridge the gap between the 30,000 foot FRBR bibliographic view and the needs of the individual library. I think we have the technology to do this today, and some of the possible answers can be found in general databases like WorldCat or DBPedia. It's the connection between these that needs to be designed.

Tuesday, April 01, 2014

FRBR group 1: the gang of four

(This is a very delayed follow on to my earlier post on FRBR groups 2 and 3. It's not that I haven't been thinking about it... and I hope soon to be able to post my talk from FSR2014 on FRBR, as well.)

Parts vs. views

Each of the three FRBR groups is defined briefly in the introduction to section 3 of the FRBR document. The second and third groups have fairly concrete definitions:

group 2 "...those responsible for the intellectual or artistic content, the physical production and dissemination, or the custodianship of the entities in the first group"

group 3 "...an additional set of entities that serve as the subjects of works"

The definition of Group 1 is more complex and considerably less clear:

"The entities in the first group (as depicted in Figure 3.1) represent the different aspects of user interests in the products of intellectual or artistic endeavour." [FRBR, p. 13]

Where groups 2 and 3 are made up of similar but independent things (which is a common definition of a class of things), group 1 consists of aspects of a single thing ("intellectual or artistic endeavor"). The term "aspects" can be defined as either parts of something or points of view about something. The difference between "parts" vs. "points of view" is important. Parts could be defined as simple, observable facts, such as the parts of a particular automobile (motor, chassis, wheels). These are characteristics of the thing itself, independent of the observer. Points of view, of course, vary for each viewer and perhaps each viewing. This would fit with the FRBR document's statement on the work:

"The concept of what constitutes a work and where the line of demarcation lies between one work and another may in fact be viewed differently from one culture to another. Consequently the bibliographic conventions established by various cultures or national groups may differ in terms of the criteria they use for determining the boundaries between one work and another." [FRBR p. 17]

That the FRBR document states that the entities are aspects of user interests rather than aspects of an intellectual endeavor implies that the entities of group 1 are not parts of the endeavor, but constructions in the minds of users. From the remainder of the FRBR document, in particular the areas where the attributes are defined for each entity, it is clear that the FRBR Study Group chose to define the bibliographic description of intellectual endeavors as a single point of view. For each entity, the Study Group has a provided a set of elements that are each defined for only that one entity, with no concession made for different points of view or interests. This is, however, in spite of the statement above that communities may have different views.

To reinforce the view of group 1 as parts of a whole, there exist dependencies between the group 1 entities such that, with the exception of work, each can only exist in combination with certain others to which it is linked. Therefore none represents a whole on its own. (In fact, there is no concept of a whole bibliographic description in FRBR. That would need to come from a different analysis.) The definitions of the entities express these dependencies.

"work: a distinct intellectual or artistic creation." [FRBR, p.17]
"expression: the intellectual or artistic realization of a work in the form of alpha-numeric, musical, or choreographic notation, sound, image, object, movement, etc., or any combination of such forms." [FRBR p. 19]
"manifestation: the physical embodiment of an expression of a work." [FRBR p. 21]
"item: a single exemplar of a manifestation" [FRBR p. 24]

as does the description of the cardinality of the relationships:

"The relationships depicted in the diagram indicate that a work may be realized through one or more than one expression (hence the double arrow on the line that links work to expression). An expression, on the other hand, is the realization of one and only one work (hence the single arrow on the reverse direction of that line linking expression to work). An expression may be embodied in one or more than one manifestation; likewise a manifestation may embody one or more than one expression. A manifestation, in turn, may be exemplified by one or more than one item; but an item may exemplify one and only one manifestation." [FRBR pp. 13-14]

This directionality, or fixed order, of the dependencies is the source of the image of group 1 as a hierarchy, where each entity connects to the entity "above" it. But there is more than one interpretation of these definitions. Taniguchi [taniguchi] reads the description of the entities as a "Russian doll" with each succeeding entity containing the previous ones. In the definitions of expression, manifestation, and item, above, each entity appears to encapsulate the one or ones above it in the diagram ("the physical embodiment of an expression of a work"). When diagrammed, this view would look like:

(Note that this does not exclude the one-to-many and many-to-many relationships as long as both expressions and manifestations can be part of more than one nested structure.)

The other interpretation, which is the most common interpretation of the entity-relation diagrams, is similar to a database design where each entity represents a single set of data elements that can be shared in one-to-many or many-to-many relationships. This view presents the four aspects as separate entities with strict relationships between them.

I perceive a contradiction between the verbal definitions in the document and the diagrams, which one presumes are intended to represent the information in the text. The decision to represent the group 1 entities as separate parts and without any overlap in data elements is a conceptual reduction of the definitions that are given early in the document, and no where does the document state that such a decision was made. There could be good reasons to implement the FRBR group 1 concepts in a particular technology as a simplified structure, but it is clear to me that the model in the diagrams is not as rich as the concepts in the text would allow.

Process

Some interpretations of FRBR treat the work, expression and manifestation as a process or continuum, moving from the idea in the creator’s mind, to an expression of that creation, and then to a manifestation where the expression becomes "manifest" or physical in nature.

"Content relationships can be viewed as a continuum from works/expressions/manifestations/items. Moving left to right along this continuum we start with some original work and related works and expressions and manifestations that can be considered ‘equivalent,’ that is, they share the same intellectual or artistic content as realized through the same mode of expression." [tillett p. 4]

The FRBR group 1 "continuum of entities" runs into problems when faced with the reality of publishing and packaging. While the line from work to expression to manifestation may follow some ideal logic, it may have been more functional to separate the description of the package from its intellectual contents. Instead, manifestation, as described in FRBR, is still based on the traditional catalog entry that mixes content and carrier by including creators, titles, and edition information, which better fit the definition of expression than manifestation.

Most explanations of work, expression, manifestation, item (WEMI) move from work to expression, then to manifestation, in that order, and most give only a slight nod to item. But in terms of cataloger workflow, WEM is a single unit that is encountered with the item in hand. While you may be able to store information about a work or expression separately in a database, you cannot separate the work from the expression or the manifestation in real life.

Agency

FRBR provides a static view of the bibliographic resource with little agency. The entities simply exist, they are not described as created as the result of an action. In fact, the entities seem to be actors themselves, as when the expression "realizes" the work. It would make more sense to say that the expression entity is the realization of the work, and that some sentient being acts to create the expression. Instead, in FRBR some unnamed magic occurs between the work and the expression. The same is true of the manifestation, which should be the result of some action that produces a physical manifestation of the expression of the work.

This static view is compatible with library cataloging, which is mainly interested in describing the item in hand as a single unit. The development of a model that emphasizes relationships between creative outputs begs for a more actor-centered view of the bibliographic universe. One could argue that it is precisely the intervention of specific actors that creates a differentiation between entities. The same music piece performed by different musicians, or the same musicians at a different time, must be a different expression. The "studio cut" and the "director's cut" of a film are either different expressions or different works (depending on your definition of work), based on the agent in control. Adding agent intervention to the model could be useful in developing clearer rules for the determination of separate entities during the cataloging process.

Conclusion

While FRBR groups 2 and 3 are composed of real world things (in the semantic web sense), group 1 appears to be an analysis of the current data of bibliographic records. The division of attributes into the four "boxes" of WEMI does not introduce new data elements but partitions the existing bibliographic record among the entities. The resulting group 1 picture resembles ISBD rather than AACR/MARC in that it is a static view of a bibliographic "done deal" with no indication of agents or subjects. Others have noticed that there are neither creator no subject attributes among the listed attributes for the work -- instead, these are included in the model as relationships defined between groups 2 and 3 and group 1. This is a logical outcome of the use of the database design methodology where data is stored for subsequent use but is not part of a data creator or data user workflow.

In the past bibliographic description has been unitarian, with one record representing one, indivisible bibliographic thing. FRBR posits a quatritarian view of the same data. The difficulty, however, is that the FRBR group 1 is not like the division of an automobile into chassis, motor and wheels; instead, where one draws the line between the separate aspects of the FRBR quaternity -- or whether one prefers a unitarian, duotarian or even a quintitarian approach -- is not based on empirical data, but on one's particular point of view. That point of view is not arbitrary, but has many factors based on material type, organization type, and the needs of the users. FRBR's four-part bibliographic description is one possibility. It may represent a particular bibliographic view, but one cannot expect that it represents all bibliographic views, either in libraries or beyond them.

[FRBR] IFLA Study Group on the Functional Requirements for Bibliographic Records. (2009). Functional Requirements for Bibliographic Records. Retrieved from http://archive.ifla.org/VII/s13/frbr/frbr_2008.pdf

[taniguchi] Shoichi Taniguchi. “A conceptual model giving primacy to expression-level bibliographic entity in cataloging”, Journal of Documentation, Vol. 58 Iss: 4, 2002. pp.363 – 382. http://dx.doi.org/10.1108/00220410210431109

[tillett] Tillett, B. What is FRBR? A conceptual model for the bibliographic universe. (p. 8). Washington, DC. 2003. http://www.loc.gov/cds/downloads/FRBR.PDF