Friday, August 12, 2011

Models of bibliographic data

There are two main models of bibliographic data that most of us are familiar with today. One is ISBD, which models bibliographic description. ISBD is a flat list of data areas:
  1. Title and statement of responsibility
  2. Edition
  3. Material type
  4. Publication, distribution, etc.
  5. Physical description
  6. Series
  7. Notes
  8. Identifier
In part, the MARC21 record implements ISBD description because AACR2, on which it is based, is compatible with ISBD but includes additional data such as headings (also known as "access points"). While I haven't seen a diagrammatic visualization of MARC, I believe it would be flat, much like ISBD.

The other primary model is FRBR. There aren't yet many examples of FRBR-based data, although there are partial examples such as the Work views in WorldCat and the Work and Personal author views in Open Library. The most fully FRBR-ized data appears to be in the VTLS Virtual database and their RDA sandbox, but I admit I haven't spent much time looking at this as it is a "pay fer" offering.

The FRBR model isn't flat, but can be drawn as three groups of inter-related entities. The actual FRBR diagrams are too complex to fit in this blog post, but here's a simplified one that I have used in slide sets:

There is a certain amount of movement in FRBR compared to the flat models of ISBD and MARC. In particular, FRBR offers the possibility of creating paths through data by following the relationships of a single entity through the descriptions of different resources. It also allows something like a Person entity to be treated as a resource on its own and therefore to be the focus of attention for some data view.

The British Library recently announced free and open versions of their British National Bibliography, with records available in a linked data format. Their analysis of the BL data, done in collaboration with Talis, a UK library systems company that is very active in linked data space, resulted in a data model (PDF) that is unlike any we have seen before. What I give below isn't readable in its details, but I wanted to highlight the the key sections or groupings that are revealed in the analysis.


There are a number of interesting aspects to this. To begin with, just by virtue of the diagramming of entities (which each get represented by an oval) you can see how much of the record is represented by named and identified entities rather than plain text. The plain text fields are on the bottom right of the diagram in the lavender boxes. Presented this way, they seem to have less importance than they do in traditional views. In sheer diagram real estate, subjects come out as the largest group, and authors appear to be more substantial than they seem in MARC models where they are reduced to short strings.

I also find it very interesting that publication is represented as an event. This makes sense to me. In FRBR, publication isn't an action but a static description of when and where and who, and the various publications are treated as separate events unrelated to a history of how the Work resurfaces over time for new generations. I like the view that a work comes to us through a series of events, not separate and unrelated manifestations.

I would like to suggest that we explore a variety of models for our data. I don't think we have to adopt one single model, but we should design our data such that it can be used in different views depending on the service being provided. I also think that we should explore these models before we put all of our eggs in the FRBR basket. We might learn something vital that should be taken into consideration for our future bibliographic data.

3 comments:

  1. Tim Hodson's blog post, British Library Data Model: Overview on his work with the BL team, provides an insight in to the thinking behind the development of the model. He promises to turn this post in to a short series to "tease out some of the more interesting aspects and give an overview of the discussions that led to the current model" - I will hold him to that when he returns from holiday ;-)

    I whole hardly agree with your suggestion that we explore a variety of models for [our] data. The BL's Neil Wilson, in his presentation from Linked Data and Libraries 2011 where the BnB data release was announced (there is a recorded video stream if you are interested), made it clear that this was a contribution to the conversation around the way to describe bibliographic resources in the future.

    The benefit of working with a Linked Data model is that it is flexible and can accommodate additions and differences, thus avoiding the one model to rule them all approach that has never really worked well for us in the past. So, as you would obviously expect from me, I encourage that conversation to take place using Linked Data techniques as a way to accommodate the inevitable overlapping differences in approach and need.

    One final plea - remember we are modelling data to describe the things that libraries provide access to, not the records that have previously held that data. Those records are a valuable source of information, but should not constrain access to and understanding of the information people need.

    Richard Wallis

    ReplyDelete
  2. Thanks, Richard. One of the primary issues I have with FRBR is that it quite deliberately constrains data to a single model. I think this will even create problems for RDA, and have already seen one place where RDA paints itself into a corner with its strict adherence to FRBR.

    My operating concept right now is "VIEWS" -- that we should be able to support a variety of views of our data, depending on what we're doing with it at a given time. I think I'll call this the "lava lamp" approach to data -- kind of like linked data but with a fluid nature. And it also sounds more fun.

    ReplyDelete
  3. Thanks for this; I haven't been following the BL linked data too closely, so I would have missed this if not for your post. What an interesting looking data model! I agree that modelling publication as an event makes a lot of sense.

    ReplyDelete

Comments are moderated, so may not appear immediately, depending on how far away I am from email, time zones, etc.