Friday, October 30, 2015

Libraries, Books, and Elitism

"So is the library, storehouse and lender of books, as anachronistic as the record store, the telephone booth, and the Playboy centerfold? Perversely, the most popular service at some libraries has become free Internet access. People wait in line for terminals that will let them play solitaire and Minecraft, and librarians provide coffee. Other patrons stay in their cars outside just to use the Wi-Fi. No one can be happy with a situation that reduces the library to a Starbucks wannabe."
James Gleick, "What Libraries (Still) Can Do" NYRDaily October 26, 2015

This is one of the worst examples of snobbery and elitism in relation to libraries that I have seen in a long time. It is also horribly uninformed. Let me disassemble this a bit.

First, libraries as places to gather is not new. Libraries in ancient Greece were designed as large open spaces with cubbies for scrolls around the inside wall. Very little of the space was taken up with that era's version of the book. They existed both as storehouses for the written word but also a place where scholars would come together to discuss ideas. Today, when students are asked what they want from their library, one of the highest ranked services is study space. There is nothing wrong with studying in a library; in fact, as anyone with a home office knows, having a physical space where you do your studying and thinking helps one focus the mind and be productive. 

Next, the dismissive and erroneous statement that people use "terminals" (when have you last heard computers called that?) to play solitaire and Minecraft completely ignores that fact that many of our information sources today are available only through online access, including information sources available to most users only through the library. If you want to look up journal articles you need the library's online access. Second, many social services are available online. The US government and most state governments no longer provide libraries with hard copies of documents, but make them available online. From IRS tax preparation help to information about state law and city zoning ordinances, you absolutely must have Internet access. Internet access is no longer optional for civic life. I can't imagine that anyone is waiting in line at a library for a one-hour slot to build their Minecraft world, but if they are, then I'm fine with that. It's no less "library-like" than using the library to read People magazine or check out a romance novel. (Gleick is probably against those, too.)

Gleick doesn't seem to know (and perhaps Palfrey, whose book he is reviewing, ditto) that libraries have limits on ebook lending.
And a library that could lend any e-book, without restriction, en masse, would be the perfect fatal competitor to bookstores and authors hoping to sell e-books for money. Something will have to give. Palfrey suggests that Congress could create “a compulsory license system to cover digital lending through libraries,” allowing for payment of fair royalties to the authors. Many countries, including most of Europe, have public lending right programs for this purpose.
This completely misses the point. Libraries already lend e-books, with restriction, and they pay for them in the same way that they pay for paper books -- by paying for each copy that they lend. Suggesting a compulsory license is not a solution, and the public lending right that is common in Europe is for hard copy books as well as e-books. The difference being that the payment for lending in those countries does not come out of library budgets but is often paid out of a central fund supporting the arts. Given that the US has a very low level of government funding for the arts, and that libraries are not funded through a single government mechanism, a public lending payment would be extremely difficult to develop in this country.  There is the very real risk that it would take money out of already stretched library budgets and would  further disadvantage those library systems that are struggling the most to overcome poor local funding.

I don't at all mind folks having an opinion about libraries, about what they like and what they want. But I would hope that a researcher like Gleick would do at least as much research about libraries as he does about other subjects he expounds on. They - we - deserve the same attention to truth.

Tuesday, October 13, 2015

SHACL - Shapes Constraint Language

If you've delved into RDF or other technologies of the Semantic Web you may have found yourself baffled at times by its tendency to produce data that is open to interpretation. This is, of course, a feature not a bug. RDF has as the basis of its design something called the "Open World Assumption". The OWA acts more like real life than controlled data stores because it allows the answers to many questions to be neither true nor false, but "we may not have all of the information." This makes it very hard to do the kind of data control and validity checking of data that is the norm in databases and in data exchange.

There is an obvious need in some situations to exercise constraints on the data that one manages in RDF. This is particularly true within local systems where data is created and updated, and when exchanging data with known partners. To fill this gap, the semantic web branch of the World Wide Web Consortium has been working on a new standard, called the SHApes Constraint Language (SHACL), that will perform for RDF the function that XML schema performs for XML: it will allow software developers to define validity rules for a particular set of RDF.

SHACL has been in development for nearly a year, and is just now available in a First Public Working Draft. A FPWD is not by any means a finished product, but is far enough along to give readers an idea of the direction that the standard is taking. It is made available because comment from a larger community is extremely important. The introduction to the draft tells you where to send your comments. (Note: I serve on the working group representing the Dublin Core community, so I will do my best to make sure your comments get full consideration.)

Like many standards, SHACL is not easy to understand. However, I think it will be important for members of the library and other cultural heritage communities to make an effort to weigh in on this standard. Support for SHACL is strong from the "enterprise" sector, people who primarily work on highly controlled closed systems like banks and other information intense businesses. How SHACL benefits those whose data is designed for the open web may depend on us.

SHACL Basics

The key to understanding SHACL is that SHACL is based in large part on SPARQL because SPARQL already has formally defined mechanisms that function on RDF graphs. There will be little if any SHACL functionality that could not be done with SPARQL. SPARQL queries that perform some of these functions are devilishly difficult to write so SHACL should provide a cleaner, more constraint-based language.

SHACL consists of a core of constraints that belong to the SHACL language and have SHACL-defined properties. These should be sufficient for most validation needs. SHACL also has a template mechanism that makes it possible for anyone to create a templated constraint to meet additional needs.

What does SHACL look like? It's RDF, so it looks like RDF. Here's a SHACL statement that covers the case "either one foaf:name OR (one foaf:forename AND one foaf:lastname):

ex:myPersonShape
    a sh:Shape ;
sh:scopeClass foaf:Person ;
    sh:constraint [
        a sh:OrConstraint ;
        sh:shapes(
            [
                sh:property [
                    sh:predicate foaf:name ;
                    sh:minCount 1 ;
                    sh:maxCount 1 ;
                ]
            ]
            [
                sh:property [
                    sh:predicate foaf:forename  ;
                    sh:minCount 1 ;
                    sh:maxCount 1 ;
                ] ;
                sh:property [
                    sh:predicate foaf:lastname  ;
                    sh:minCount 1 ;
                    sh:maxCount 1 ;
                ]
            ]
        )
    ] .

SHACL constraints can either be open or closed. Open, the default, constrains the named properties but ignores other properties in the same RDF graph. Closed, it essentially means "these properties and only these properties; everything else is a violation."

There are comparisons, such as "equal/not equal" that act on pairs of properties. There are also constraints on values such as defined value types (IRI, data type), lists of valid values, and pattern matching.

The question that needs to be answered around this draft is whether SHACL, as currently defined, meets our needs -- or at least, most of them. One way to address this would be to gather some typical and some atypical validation tests that are needed for library and archive data, and try to express those in SHACL. I have a few examples (mainly from Europeana data), but definitely need more. You can add them to the comments here, send them to me (or send a link to documentation that outlines your data rules), or post them directly to the working group list if you have specific questions.

Thanks in advance.