Sunday, October 19, 2014 - where it works

In the many talks about, it seems that one topic that isn't covered, or isn't covered sufficiently, is "where do you do it?" That is, where does it fit into your data flow? I'm going to give a simple, typical example. Your actual situation may vary, but I think this will help you figure out your own case.

The typical situation is that you have a database with your data. Searches go against that database, the results are extracted, a program formats these results into a web page, and the page is sent to the screen. Let's say that your database has data about authors, titles and dates. These are stored in your database in a way that you know which is which. A search is done, and let's say that the results of the search are:
author:  Williams, R
title: History of the industrial sewing machine
date: 1996
This is where you are in your data flow:

The next thing that happens (and remember, I'm speaking very generally) is that the results then are fed into a program that formats them into HTML, probably within a template that has all your headers, footers, sidebars and branding and sends the data to the browser. The flow now looks like

Let's say that you will display this as a citation, that looks like:
Williams, R. History of the industrial sewing machine. 1996.
Without any fancy formatting, the HTML for this is:
<p>Williams, R. History of the industrial sewing machine. 1996.</p>
Now we can see the problem that is designed to fix. You started with an author, a title and date, but what you are showing to the world is a string of characters are that undifferentiated. You have lost all the information about what these represent. To a machine, this is just another of many bazillions of paragraphs on the web. Even if you format your data like this:
<p>Author: Williams, R.</p>
<p>Title:  Williams, R. History of the industrial sewing machine</p>
<p>Date: 1996</p>
What a machine sees is:
<p>blah: blah</p>
<p>blah: blah</p>
<p>blah: blah</p>  
What we want is for the program that is is formatting the HTML to also include some metadata from that retains the meaning of the data you are putting on the screen. So rather than just putting HTML formatting, it will add formatting from has metadata elements for many different types of data. Using our example, let's say that this is a book, and here's how you could mark that up in
<div vocab="">
<div   typeof="Book">
    <span property="author">Williams, R.</span> <span property="name">History of the industrial sewing machine</span>. <span property="datePublished">1996</span>.
Again, this is a very simple example, but when we test this code in the Google Rich Snippet tool, we can see that even this very simple example has added rich information that a search engine can make use of:
To see a more complex example, this is what Dan Scott and I have done to enrich the files of the Bryn Mawr Classical Reviews.

The review as seen in a browser (includes markup)

The review as seen by a tool that reads the structured data.

From these you can see a couple of things. The first is that the markup does not change how your pages look to a user viewing your data in a browser. The second is that hidden behind that simple page is a wealth of rich information that was not visible before.

Now you are probably wondering: well, what's that going to do for me? Who will use it? At the moment, the users of this data are the search engines, and they use the data to display all of that additional information that you see under a link:

In this snippet, the information about stars, ratings, type of film and audience comes from schema. org mark-up on the page.

Because the data is there, many of us think that other users and uses will evolve. The reverse of that is that, of course, if the information isn't there then those as yet undeveloped possibilities cannot happen.

1 comment:

Anonymous said...

Thanks, Karen, this is so helpful,

a non-techie, long time cataloger who has been following you and trying to get a visual on how in the world linked data actually works