Sunday, April 24, 2011

Visualizing linked data

Chris Oliver, Diane Hillmann and I will be reprising (and updating) our three-part webinar on RDA and the future of library metadata starting on May 11. As before, Chris will cover the principles behind RDA and why RDA is different from other cataloging codes; I will talk about the Semantic Web and why it is important for libraries to be part of the web of data (May 18); Diane will show how the Open Metadata Registry makes possible a Semantic Web-compatible version of RDA (May 25).
One of the questions I always get when talking about the Semantic Web is "What does it look like?" This is kind of like asking what electricity looks like: it doesn't so much look like anything, as it makes certain things possible. But I fully understand that people need to see something for this all to make sense, so when the webinar technology allows it I have started showing some web pages. When it doesn't, I send people to links they can explore on their own. Since some of you may have this same question, here are a few illustrations using two sites that can present authors in a Semantic Web form.

When you do a search for an author on the Open Library you retrieve a page for the author. This is a page for the author Barbara Cartland. The page has not been hand-coded by a human but is derived "on the fly" from the information in the Open Library database.

That same information is available in a semantic web format, RDF in XML. (Note: it is common to code Semantic Web data in XML, but that's not the only possible data format. There is nothing inherent in the Semantic Web that would make it XML-like, it's just a convenience.) This is not intended to be human friendly -- it is code to be used by programs. You should notice that it makes use of identifiers that look like URLs:

<foaf:person about=""></foaf:person>
The above establishes the primary identifier for all of the information that follows in the XML.

You will also see that, like other applications using XML, it allows you to mix data elements from different "namespaces." The Open Library RDF uses a mix of elements from Dublin Core, Friend-of-a-Friend (FOAF), the Bibliographic Ontology, and RDA Vocabularies.

Another database that provides its data in RDF is the Virtual International Authority File, VIAF. VIAF combines the name authority data from about twenty national authority files, making it possible to translate from different name display forms when exchanging data. Here is part of the VIAF display for Barbara Cartland:

You can retrieve or export the metadata for this author in various formats including MARC and RDF/XML. Once again you will see that the RDF form of the data makes use of FOAF, a standard called "Simple Knowledge Organization System" or SKOS, and also RDA vocabularies for the FRBR Group2 entities from the Open Metadata Registry.

You can look at more examples on my links page, but I hope that this takes some of the mystery out of Semantic Web data, or at least makes the mystery a known rather than unknown puzzler.

Monday, April 18, 2011

FRBR as cake

I keep trying to explain what bothers me about FRBR, and in particular about WEMI. I've recently thought about it it with this image of a cake. I know this is a flawed analogy, but it works for me on some level. It goes like this:

When you make a cake, you have a number of ingredients:

When you mix them together to make a cake you don't get this:

You get this:
My point here, in case it isn't clear, is that the purpose of creating a bibliographic description using a number of different entities is to... well, to create a bibliographic description; something that as a whole has meaning. You can create it from individual "ingredients," like information about a Work and an Expression, but those do not need to remain separate entities in your final product; instead, that information can become part of your whole.

I know that people like the idea of a distributed bibliographic description with a single Work entity that links to many Expressions that then link to many Manifestations, etc., and that could be the underlying structure of ones data store. But just because there are Work entities (eggs) doesn't mean that our metadata keeps the Work entity "intact." In fact, our systems may use only a portion of the Work entity, and may use bits of it at different times in different contexts.

Leaving poorly-drawn analogies aside, creating our data as sets (or "graphs") of triples should give us maximum flexibility. One thing this means is that even a partial description is valid. Thus a full library catalog record and an abbreviated citation are both valid representations of a resource. They should connect to the larger linked data information space through any of the statements they contain, regardless of the structure of their graphs. And it is my guess that many bibliographic descriptions will be simple graphs with a single RDF subject (that means a single bibliographic resource). The highly structured bibliographic universe of FRBR will be a minority case, and the FRBR entities, like our eggs and sugar and flour, will be useful ingredients that disappear into actual creations.