Sunday, April 24, 2011

Visualizing linked data

Chris Oliver, Diane Hillmann and I will be reprising (and updating) our three-part webinar on RDA and the future of library metadata starting on May 11. As before, Chris will cover the principles behind RDA and why RDA is different from other cataloging codes; I will talk about the Semantic Web and why it is important for libraries to be part of the web of data (May 18); Diane will show how the Open Metadata Registry makes possible a Semantic Web-compatible version of RDA (May 25).
One of the questions I always get when talking about the Semantic Web is "What does it look like?" This is kind of like asking what electricity looks like: it doesn't so much look like anything, as it makes certain things possible. But I fully understand that people need to see something for this all to make sense, so when the webinar technology allows it I have started showing some web pages. When it doesn't, I send people to links they can explore on their own. Since some of you may have this same question, here are a few illustrations using two sites that can present authors in a Semantic Web form.

When you do a search for an author on the Open Library you retrieve a page for the author. This is a page for the author Barbara Cartland. The page has not been hand-coded by a human but is derived "on the fly" from the information in the Open Library database.

That same information is available in a semantic web format, RDF in XML. (Note: it is common to code Semantic Web data in XML, but that's not the only possible data format. There is nothing inherent in the Semantic Web that would make it XML-like, it's just a convenience.) This is not intended to be human friendly -- it is code to be used by programs. You should notice that it makes use of identifiers that look like URLs:

<foaf:person about="http://openlibrary.org/authors/OL22022A"></foaf:person>
The above establishes the primary identifier for all of the information that follows in the XML.

You will also see that, like other applications using XML, it allows you to mix data elements from different "namespaces." The Open Library RDF uses a mix of elements from Dublin Core, Friend-of-a-Friend (FOAF), the Bibliographic Ontology, and RDA Vocabularies.

Another database that provides its data in RDF is the Virtual International Authority File, VIAF. VIAF combines the name authority data from about twenty national authority files, making it possible to translate from different name display forms when exchanging data. Here is part of the VIAF display for Barbara Cartland:


You can retrieve or export the metadata for this author in various formats including MARC and RDF/XML. Once again you will see that the RDF form of the data makes use of FOAF, a standard called "Simple Knowledge Organization System" or SKOS, and also RDA vocabularies for the FRBR Group2 entities from the Open Metadata Registry.

You can look at more examples on my links page, but I hope that this takes some of the mystery out of Semantic Web data, or at least makes the mystery a known rather than unknown puzzler.

5 comments:

Tormento Malsano said...

Usefull links. Thank you.
A little comment about the example. Open Library doesn't answer the question "¿How many works wrote Cartland?". She wrote a lot, but surely she didn't write 2456.
Perhaps the problem with WEMI is that it can't be acomplished without human work.

Karen Coyle said...

TM - Yes, you are right, the "how many works question" is a hard one, mainly because the metadata has too many variants for you to do that accurately. Open Library does let users (humans) manually bring together author instances (Barbara Cartland's name was originally written in at least a dozen different ways, which you can see in the variant names on her page), and to bring together instances of the same book. LibraryThing also implements WEMI with human judgment and input.

Even with human work, WEMI is not a single concept. For example, within different library specialties there are different interpretations of the definition of Work, notably with music and film cataloging. I've heard many people state that they think that a translation should be a new Work, not an Expression of a Work, but in the FRBR documentation translations are Expressions. If you look at the ontology created for citations, FABIO you see an interpretation of WEMI that differs significantly from that of IFLA's FRBR and RDA.

How this will all play out when we mix and match our data on the Web is going to be interesting, but for sure we cannot count on everyone making the same decisions about WEMI.

deek said...

In looking at the VIAF data I don't see a way to "export" the MARC data to pull it into my automation system. I can view it, but then I have to manipulate it to get it into a raw marc format.

Is there a Z39.50 component to it?

Sam Davis said...

Here is a wonderful resource on authors:
http://worldcat.org/identities/

Search Barbara Cartland, select the first item and wow - publication timeline, most widely-held items by and about her, subject tag cloud, and more

Karen Coyle said...

Sam,

Yes, I LOVE WC Identities! I often use it to show the real wealth of data that we have in library catalogs if we could only get at it.

I don't know if it has APIs -- I would love to see it get more use, but at the moment it seems to stand alone.