Wednesday, November 07, 2007

Hierarchy v. Relationships

The use of hierarchy as an organizing principle keeps coming up. I think we are attracted to hierarchy because of its neatness, even though in fact the real world is organized more like fuzzy sets. Fuzzy sets are hard to comprehend, nearly impossible to draw, and can't be slotted neatly into an application.

When people talk about FRBR, they are often focussed on the Group 1 entities, and those are seen as hierarchical. They tend to be shown as:
- Work
-- Expression
--- Manifestation
---- Item

as if we'll fit all of our intellectual works into such a neat hierarchy. T'ain't so. Of all of the relationships that are talked about in FRBR (I almost said "expressed" but that term has now been given a new meaning in this discussion) I think these are the least interesting. And they become even less interesting when we move beyond the traditional inventory control function of the library catalog and begin to see ourselves as navigating in a knowledge universe. But first let me tackle the Group 1 entities.

There are complaints (or remarks, depending on the context) that we don't have an agreed on definition of Work, and that the division between Work and Expression is unclear. They are unclear because in real life there isn't a neat hierarchy that just needs to be modeled. What is a Work is entirely contextual -- when I'm looking for an article, the article is a Work. When I'm subscribing to a journal, the journal is a work. When I'm on iTunes a song is a work, when I'm in the music store the album is a work. A Work is the content I am seeking at that time. In the imaginary universe where I get to create my bibliographic system, a Work will be defined as: anything you wish to talk about, point to, address. So a book-length text is a work, an article in a journal is a work, a journal is a work, a book chapter is a work -- all at the same time and in the same system. For one person the book Wizard of Oz and the movie Wizard of Oz will be a single Work. To a film buff, the director's cut of Blade Runner and the original release are distinct works. To be a Work, it just has to be definable and have a way to name it, that is it has to have an identifier. But anything can be a Work. As a matter of fact, I probably won't use the term Work at all in my universe.

As for Expressions, there will be very obvious Expressions of Works, and there will be fuzzier Expressions. There will be Expressions that express more than one Work. Expression is a relationship, not a subset. If you don't have to organize your bibliographic universe in a hierarchical way, then the need to slot each Expression under a Work goes away, although the relationship can remain.

I'm less sure about Manifestation and Item, even though these are the most concrete of the Group 1 entities. Are they a legitimate focus of a Knowledge Management system, or are they about managing physical objects? When I think about some of the uses of bibliographic data, for instance as citations in a text publication, Manifestation seems to be mainly about locating -- so if I've quoted a passage from a book, I need to cite the manifestation and the page because that's the only way that someone else can find that exact quote. When I include a URL in a document that links to a particular digital manifestation, I am giving the user a direct link to the location. Manifestations and Items will be of interest in some instances, say to rare book collectors, but I'm not at all sure that those instances justify the emphasis they have been given. And if the purpose is primarily inventory control, then I think those relationships will be managed to the extent that they matter to the library. For example, a public library may not terribly care which manifestation of the book Moby Dick is on its shelves, although its inventory system will need to know the barcode, and its acquisitions system will need to store how much the library paid for it and the provider.

The truly interesting relationships in FRBR are those between and among these entities, and those are ones that I have not seen explored. These are the relationships between things: thing1 is a translation of thing2; thing3 is an abridgment of thing4; thing5 extends thing6 in this certain way; thing7 cites thing1; thing8 continues thing3. This is where we get real value, where we provide various interesting paths through which seekers can navigate. This is what we don't provide explicitly in our catalogs today, although a human user may be able to intuit some of these relationships among the works we present.

We have so narrowly defined bibliographic control in libraries that it doesn't really include the relationships between intellectual products, except to the degree that we might make a note that one thing is a translation of another thing. But we see those relationships as "extra" or "secondary," and yet they are the very essence of knowledge creation. It astonishes me that we have focused so completely on the physical items that we have essentially missed what would make our catalogs intelligent.

12 comments:

Alain Pierrot said...

This is where a wider adoption of semantic web techniques could really open new perspectives.

Ontologies description tools such as Protege are good to express relationships between "things", in a non-hierarchical structure.

Some ideas about implementing relationships between works, expressions, manifestations and "subjects" could be derived from, for instance RDF Schema and Protege ontology representations of the Subject Reference System.

Karen Coyle said...

Alain, thank you for the links. I will spend some time with Protege and see what it can do. It looks great at a first glance.

I agree that we should at least explore the semantic web possibilities. The FRBR ER model is a step toward that, and there are RDF and OO models that have tried to define FRBR more rigorously, but I don't think anyone has put those into practice.

I think one of the problems, though, is that we need to find a way to translate the language of libraries into the language of the semantic web, and vice versa. I'm afraid that if you say "ontologies" to most librarians it doesn't have meaning for them, just as "authority control" means nothing in the semantic web world. In some cases, we are doing similar things but calling them something different. What we've got here is a classic failure to communicate.

Alain Pierrot said...

The mere term "ontology" is unfortunate: if it is taken in its traditional meaning in the mouth of computer scientists, it only shows a poor philosophical insight...

From my experience, using a tool such as Protege (without mentioning "ontologies" when it can be avoided) helps a lot to show how "things" can be described in different ways, and enter into multiple relationships which themselves can be described and used in different ways to retrieve or identify.

I used it once to describe an organisation, and link functional descriptions such as "head of department name" with the names of the actual persons, the places where people were located, tasks or projects in which they were involved (permanently or not) etc.

The "ontology" was not completed, but (I hope...) the exercise to modelise part of the organisation with Protege helped raising awareness that all views of — and paths to — "things" were relevant and had their utility.

Alain Pierrot said...

Jean-Marc Destabeaux (thanks to him) points me to a more relevant ontology: CIDOC/CRM, whose scope is defined as «all information required for the scientific documentation of cultural heritage collections».

Patrick Le Boeuf, from the French national library BnF is managing the harmonization project of FRBR and CIDOC/CMR.

Karen Coyle said...

Yes, it was the CIDOC folks who did FRBRoo, as I recall. I see CIDOC as an obvious place where we would want to share some data structures (eg creators), but not all. CIDOC is focused on describing *things* where libraries focus more on describing intellectual output, not the physical manifestation. But wherever there are data elements or relationships in common, we should be trying to harmonize those.

I'm not sure I'm in agreement where CIDOC extends the FRBR work with complex and serial works. I haven't decided if those are entities in themselves, or if they are the result of relationships. Basically, if you don't need to have a hierarchy that has "work" at the top (with only one definition), then you probably don't need to define complex and serial works.

I need to create some examples for myself to make my own ideas more concrete. I need to test them out with diagrams to see if they work.

Con said...

Hi Karen,

I very much agree with your point about other relationship types.

You may be interested to have a look at the New Zealand Electronic Text Centre which is a digital library website of TEI-encoded texts, with a Topic Map as our metadata system. We don't have a full FRBR implementation, but only something built on the CIDOC CRM. We have an entity that corresponds more or less to a FRBR Work, and which may be related to an entity that corresponds to a Manifestation (which is what we have encoded as full text). We don't have a either "Expressions" or "Items".

For example, we have a record which represents the Work "The Treaty of Waitangi" (which is an important constitutional document in NZ). This page links to 4 distinct encodings of Manifestations of this work.

But more interestingly we also model other texts which "mention" the Treaty of Waitangi (the work). We also have "aboutness" relations between records. For instance, we have a work (a poem) called "He Waiata o Hemi" (A Song of James). This work has a manifestation, which you can read, and is also the subject of another work (a critical work), and which, in fact, quotes the poem in full.

These kinds of relationships are fascinating and serendipitous, but there's a lot of authority work required in encoding them.

Samuel said...

Some literary concepts, as e.g. 'intertextuality', are too fuzzy to be constrained by a set of predefined links and relationships, as Literature (language+expression) is itself non-referential...

'Manifestations' can be also strongly attached to oral culture and folklore (in the ways people recall and retell an history, with many variations in place, narrator, language) and many symbolic, non-textual artifacts may now be stored, disseminated or linked under many 'technical possibilities' but consistency issues so valued in computing or OPACs seem far from 'end user needs' in general... at least in the first moments...

Some people may see cataloguing from their 'limiting' side, but human institutions in general offer some limits so as to provide sense and security for human activities. Are cataloguing & classifying 'human institutions' or at least closed attached to them ?

Prof. Birger Hjorland pays attention to the place of a 'bibliographic paradigm' among our conceptions of library work:

http://informationr.net/ir/12-4/colis/colis06.html

Is 'bibliographic work' a necessary but still 'irreplaceable' concept for what we do?
Perhaps this is the one of the crossroads for considering AACR2 vs. RDA debate at this historical/institutional moment

Best Wishes!

Stephen said...

The problem with FRBR is not just whether something is or is not a work; it's also whether two things are the same work or different works. Having decided that X is a work, one must also decide whether X is THIS work, or some other. In other words, the practical application of FRBR's hierarchy is all about relationships.

While the real world is full of soft boundaries, the logical world that we construct with cataloging expects clearly defined entities. The marking of distinctions is less about mimesis and more about building an efficient tool for navigating and finding. One advantage of hierarchy is that it lets us treat things as both the same (on one level) and different (on another) without being logically inconsistent.

I'd rather see the FRBR hierarchy turned on its head. Start with the item as the primary object of description, and treat all the other levels as possible categories of relationship. Recognize that some items are not manifestations of anything; some manifestations are not expressions of anything; and some expressions are not expressions of works. The use of the term "work" makes this reading a hard sell, since most creators claim "works" are what they make. But the FRBR hierarchy would be more useful as primarily a framework for defining relationships among objects than as an analysis of the ontological levels latent in each object.

Jonathan said...

Karen,

While I agree with many of the ideas in your post (particularly the importance of relationships in navigating between resources) the idea that we can do away with hierarchies entirely betrays a lack of information systems background. Without a hierarchical system of data storage information is duplicated (i.e. stored in denormalized form), which not only taxes system resources but more importantly becomes a big headache to keep synchronized across all its locations. In addition, you see an explosion in the number of relationships that must be recorded in the system. If Work A is a sequel to Work B then that relationship is specified once at the Work Level, and all Expressions/Manifestations/Items of that work automatically "inherit" it. In a non-hierarchical system that relationship must be specified for every resource that has to do with A, and so instead 1 relationship in your system you are dealing with |A|*|B|, with all the attendant resource stress and synchronization problems.

Karen Coyle said...

Jonathan, I'm not sure how you got from my post to db design (in which I have about 25 years of experience, btw). How things will be stored at a db level is entirely another matter. My post is about the semantic level. However, since you bring up db design, I have to say that in my experience text data doesn't lend itself greatly to the efficiencies of traditional relational or hierarchical design, and in systems I have worked on very little use is made of the dbms functions that are so commonly used in, eg, business applications.

I am assuming (and I think I am correct) that as we move more into xml and oo-based data stores, that the underlying dbms's will optimize for storage and efficient retrieval -- and few of us will have to be concerned about that machine level. In the work I'm currently doing with a SOLR-based system, there is little concern about redundancy in the data. Of course these are days in which systems programmers can say things like "we'll just keep the entire index in memory."

I also am not suggesting that we do away with hierarchy entirely. Hierarchy, as a way of organizing segments of knowledge, is not only extremely useful, I think it is as close to being a natural human function as you can get. I took a psych class from Eleanor Rosch at Berkeley, mentioned in Weinberger's book for her work on human thought and hierarchy. If I had continued with my PhD, I wanted her on my thesis committee. (I didn't continue, obviously.) My opinion on hierarchy (which is rather Roche-ean)is that it is a primitive way of organizing knowledge. Relationships are richer albeit more ambiguous, and allow for more sophisticated knowledge to develop.

Jonathan said...

Karen,

Thank you for taking the time to reply to my comment. And I apologize if I in any way came off as antagonistic. Again, I agree with almost all your larger points, particularly the importance of relationships at the intellectual content level (along with defining a standard controlled vocabulary for them). And indeed the similarness of our respective examples in discussing FRBR is uncanny.

In your original post you raised many valid issues that are not so much deficiencies because FRBR is overly-hierarchal, but because it is insufficiently hierarchal- namely, it does not adequately discuss aggregation or containment at each entity level. Examples of containment at the manifestation level are somewhat uninteresting. But consider a monograph (Work) whose 5th chapter is excerpted in a serial. The excerpt is probably the same Work as the monograph, or at worst a sub-Work. But what about an essay which is expanded into a full-length book? Conceptually it could be argued that the book is a sub-work as well, even though it is technically a "superset" of the essay in terms of pure textual content. This is just one example confirming your point (and actually acknowledged in the FRBR final report) that the boundaries between entities are somewhat culturally contingent.

Still, despite their inability to perfectly model such cases hierarchies are still indispensable, and not only because of the limitations of current database systems. When we talk about relationships between resources we are ultimately talking about graph theory, and if resources (i.e. graph nodes) can have almost any sort of relationship (graph vertex) to any other node, then you will have combinatorial explosion, and computer scientists have long known that these sorts of models quickly become intractable to ANY conceivable computer once they exceed trivial sizes. Or, to give a less theoretical example, a user looking for a resource is really only interested in the relationships for a particular level at any one time. When I am looking for content I only care about Work-level relationships (sequel, rejoinder, parody). When I find that content then I care about the Expression-level relationships (contains, abridges, translates). A hierarchical system thus helps protect user (in addition to system) from information overload.

Kevin said...

Karen wrote, "What is a Work is entirely contextual -- when I'm looking for an article, the article is a Work. When I'm subscribing to a journal, the journal is a work. When I'm on iTunes a song is a work, when I'm in the music store the album is a work. A Work is the content I am seeking at that time."

I agree that the Work is contextual, but it's important to keep in mind that the user may be after not a particular Work but a particular Expression, Manifestation, or even Item. In broad terms, the more specialized a user's information need is, the more likely he or she will be looking for a Group 1 Entity that's lower in the hierarchy (closer to Item). These other Group 1 Entities are likewise contextual: for example, an Item may be a particular copy of all of the volumes of the Encyclopédie or a particular papyrus fragment.