Sunday, January 17, 2016

Sub-types in FRBR

One of the issues that plagues FRBR is the rigidity of the definitions of work, expression, and manifestation, and the "one size fits all" nature of these categories. We've seen comments (see from p. 22) from folks in the non-book community that the definitions of these entities is overly "bookish" and that some non-book materials may need a different definition of some of them. One solution to this problem would be to move from the entity-relation model, which does tend to be strict and inflexible, to an object-oriented model. In an object-oriented (OO) model one creates general types with more specific subtypes that allows the model both to extend as needed and to accommodate specifics that apply to only some members of the overall type or class. Subtypes inherit the characteristics of the super-type, whereas there is no possibility of inheritance in the E-R model. By allowing inheritance, you avoid both redundancy in your data but also the rigidity of E-R and the relational model that it supports.

This may sound radical, but the fact is the FRBR does define some subtypes. They don't appear in the three high-level diagrams, so it isn't surprising that many people aren't aware of them. They are present, however in the attributes. Here is the list of attributes for FRBR work:
title of the work
form of work
date of the work
other distinguishing characteristic
intended termination
intended audience
context for the work
medium of performance (musical work)
numeric designation (musical work)
key (musical work)
coordinates (cartographic work)
equinox (cartographic work)
I've placed in italics those that are subtypes of work. There are two: musical work, and cartographic work. I would also suggest that "intended termination" could be considered a subtype of "continuing resource", but this is subtle and possibly debatable.

Other subtypes in FRBR are:
Expression: serial, musical notation, recorded sound, cartographic object, remote sensing image, graphic or projected image
Manifestation: printed book, hand-printed book, serial, sound recording, image, microform, visual projection, electronic resource, remote access electronic resource
These are the subtypes that are present in FRBR today, but because sub-typing probably was not fully explored, there are likely to be others.

Object-oriented design was a response to the need to be able to extend a data model without breaking what is there. Adding a subtype should not interfere with the top-level type nor with other subtypes. It's a tricky act of design, but when executed well it allows you satisfy the special needs that arise in the community while maintaining compatibility of the data.

Since we seem to respond well to pictures, let me provide this idea in pictures, keeping in mind that these are simple examples just to get the idea across.


The above picture models what is in FRBR today, although using the inheritance capability of OO rather than the E-R model where inheritance is not possible. Both musical work and cartographic work have all of the attributes of work, plus their own special attributes.

If it becomes necessary to add other attributes that are specific to a single type, then another sub-type is added. This new subtype does not interfere with any code that is making use of the elements of the super-type "work". It also does not alter what the music and maps librarians must be concerned with, since they are in their own "boxes." As an example, the audio-visual community did an analysis of BIBFRAME and concluded, among other things, that the placement of duration, sound content and color content in the BIBFRAME Instance entity would not serve their needs; instead, they need those elements at the work level.*

This just shows work, and I don't know how/if it could or should be applied to the entire WEMI thread. It's possible that an analysis of this nature would lead to a different view of the bibliographic entities. However, using types and sub-types, or classes and sub-classes (which would be the common solution in RDF) would be far superior to the E-R model of FRBR. If you've read my writings on FRBR you may know that I consider FRBR to be locked into an out-of-date technology, one that was already on the wane by 1990. Object-oriented modeling, which has long replaced E-R modeling, is now being eclipsed by RDF, but there would be no harm in making the step to OO, at least in our thinking, so that we can break out of what I think is a model so rigid that it is doomed to fail.


*This is an over-simplification of what the A-V community suggested, modified for my purposes here. However, what they do suggest would be served by a more flexible inheritance model than the model currently used in BIBFRAME.

5 comments:

Karen Coyle said...

Comments came in on Twitter (well, short comments), so I'd like to go back to them here where there is more room.

Eric Hellman called out the use of "eclipsed" where I say that OO is being eclipsed by RDF. I admit that's an exaggeration and possibly a misnomer, but I'm having difficulty finding another word. OO programing is alive and well, but in the library world as we search for a new data model, everyone is turned to linked data and RDF. I think there are good reasons for this, since in our case making connections is key to helping users. What I do find interesting is that I haven't found any proposals for an OO bibliographic data model, not even as examples in data modeling books (where bibliographic data is sometimes used as an example). (FRBRoo is another discussion, and I need to spend more time with it before commenting.)

Lukas Koster pointed to a site describing ERD and ERM (entity-relation modeling) that includes subtypes and inheritance, indicating that one can do that within the E-R realm.Thinking about this I now feel strongly that we should not go with a tweak to E-R, for this reason: the thinking about FRBR is both so ingrained and so unclear that it would be better to break that bond by moving into a very different model. In the same way, folks steeped in XML who approach RDF as RDFXML tend to miss some of the key differences between RDF and XML. Learning a new model is like learning a new language -- it expands more than your vocabulary, it gives you some different patterns to think with.

Here's a question, though, for everyone: Library data has been treated as static -- something you could print on a card and put in a file for dozens or more years. If you were to create a OO model for bibliographic data, what methods would your objects have? Yes, I know, we'd have to model the objects and methods at the same time, but I'm having trouble thinking of methods. If we have an object like work, other than being created, updated, and deleted, what can it do? I'm drawing a blank.

Eric said...

For RDF's original application domain, "web resource description", if there's anything eclipsing anything else, I'd say Schema.org is eclipsing RDF Schema. Both schema.org and RDFS feel like OO, with inheritance and subtypes. But "eclipsing" is probably the wrong word here too, as Schema.org can be expressed as RDFS. But I think that the RDFS project of building OOish typing using the RDF machinery is not progressing. (I say "OOish" because I'm agnostic about flavors of type/class systems; though I'm happy with the way python does it.)

This bolsters the central point of your post, that bibliographic models need viable typing and subclassing to be generally useful.

To answer your question about methods, we're not modeling data so much as we're modeling and representing real-world things. In software running Unglue.it, the Work object has 54 methods. Most of these are really properties - for example a creator_list which is a string generated from attached creator objects. But most of the methods are application specific - they apply to an Unglue.it Work rather than a Universal Work. But these objects live on top of en entity-relational model. Over a few years, we've seen our model replace entities such as ISBN or author with typed objects such as Identifier or Creator as we adapt to the complexities of the real world.

Dorothea said...

Regarding possible class/instance methods: It seems to me at least some of the Work/Expression dichotomy weirdness could be set aside with an event-based model of how things happen.

So for example, Tirso de Molina creates the Work (in this case, a play, which is probably a Work subclass) El burlador de Sevilla, which gets translated umpteen bazillion times -- each of those an instance method, work.translate() -- adapted into any number of other Work subclasses e.g. Mozart/DaPonte's opera Don Giovanni via a different method, and so on and so forth.

The downside here (well, one of them) is that multiple methods might have to be run at once -- Don Giovanni is BOTH a translation (Spanish --> Italian) and an adaptation (drama --> opera) of El burlador. That might not be so bad, really, if we envision methods as adding a bit of relationship metadata plus (sub)class-specific metadata to the Work object.

Another downside is that the classes and subclasses would proliferate into a Big Ball of Mud awfully quickly. My comparative-literature soul delights, but my (minimal) cataloger soul shudders in dismay.

On the whole, though, getting rid of Expressions altogether and thinking of some Works as other Works plus event methods makes me a lot happier than the current W/E situation.

Lukas Koster said...

A couple of remarks on ER vs. OO and RDF. The site that I mentioned is the Stuctured Analysis Wiki (http://www.yourdon.com/strucanalysis/wiki/index.php/Chapter_12#Subtype.2FSupertype_Indicators) by Ed Yourdon, one of the pioneers of structured analysis in system design (https://en.wikipedia.org/wiki/Edward_Yourdon). The subtype/supertype/inheritance features of ERM are not just “a tweak to E-R” as you say, but extensions contributed as early as 1981 by M. Flavin made to the original ER method developed by Peter Chen (see for instance http://www.eiminstitute.org/library/eimi-archives/volume-1-issue-12-february-2008-edition/origins-of-data-modeling-the-forgotten-story).

The main (relevant here!) difference in my view between ERM and OO is that ERM only describes real world data, whereas OO describes real world data and procedures acting on the real world things (methods/operations), which RDF does not do either. (By the way, I have never understood why it is called Object Oriented modeling, because the method describes Classes (=Entities), not Objects, which are instances of Classes).

Also, I don’t agree that ERM should be regarded as a technology, it is a method that is technology agnostic (all depending on the definitions used of course). A conceptual ER model, describing a real world domain (from any subjective perspective, so errors implied) can be implemented in any number of storage formats and structures. ERM is not that rigid as you say, and I do not think it is doomed to fail as such.

So, FRBR could have been described in a much better way using the full ERM potential than it was. Having said all this, of course I agree that starting from scratch with for instance an open RDF model is to be preferred.

Karen Coyle said...

Lukas, I agree that using the full ERM could result in a much better data model. In fact, just about any modern modeling language would be better than what we've got now. But shouldn't we find it a bit disconcerting that FRBR contains sub-types of some entities, and yet the developers of FRBR seem unaware of that? It's the contradictions like this that drive me nuts.

I didn't want this to be a discussion of ERvsOO, and that it became so is undoubtedly my fault. I was using OO as an example, but any other modeling language would be fine, and I obviously didn't make that clear. The main point is that there are so many contradictions within FRBR that we really must see it as a flawed model, something that needs to be fixed before we try to do anything with it. Yet, for much of the library world it is being taken as gospel - and not just gospel, but the model for our future data standards.

What also bugs me is, as I said in my SWIB talk, is this confusion of tech modeling and conceptual modeling. I'm looking at the latest document from the IFLA FRBR group and again they make the leap from a very high level model of bibliographic concepts directly to technology (linked data), and toss in tech terms that don't belong in a conceptual model ("domains and ranges"). This model, which, as I mention above, I'm being told is a conceptual model, will be presented in a session called "Modeling Bibliographic Information for a Web of Data: Challenges and Achievement - Cataloguing". So it is being presented as a model for linked data? Floor wax? Dessert topping?

And although I don't want this to be a debate of ERvsOO, I will say that one of the things that is bothering me about the library view of data is how static it is. Inert. I would like us to think about ways that we could make our data more dynamic. Some of that will be done with relationships, because every new item that we add to a library catalog should create new relationships. But I also think that we need to add the dimensions of time and space, and to make it possible for users to navigate those axes. OO encourages thinking about actions, and there is a complete lack of actions in FRBR and in the cataloging rules. Maybe being forced to think of what the data will do would be useful. I'm willing to try that and any other ideas that will move us forward.