Tuesday, November 03, 2015

The Standards Committee Hell

I haven't been on a lot of standards committees, but each one has defined a major era in my life. I have spent countless hours in standards committees. That's because a standards committee requires hundreds of hours of reading emails, discussing minutiae (sometimes the meaning of "*", other times the placement of commas). The one universal in standards creation is that nearly everyone comes to the work with a preconceived idea of what the outcome should be, long before hearing (but not listening to) the brilliant and necessary ideas of fellow members of the committee. Most of these standards-progressing people are so sure that their sky is the truest blue that they hardly recognize the need to give passing attention to what others have to say.

In one committee I was on, the alpha geek appeared the first day with a 30-page document in hand, put it on the table, and said: "There. It's done. We can all go home now." He was smiling, but it wasn't a "ha ha" smile, it was a "gotcha" smile. That committee lasted over two years, two long, painful years in which we never quite climbed out of the chasm that we were thrown into on that first day. Over that two-year period we chipped away at the original document, transformed a few of its more arcane paragraphs into something almost readable, and eventually presented the world with a one hundred page document that was even worse than what we had started with. Thus is the way of standards.
"...it is so perfect in fact that the underlying model can be applied to any - absolutely any - technology in the universe."
A particular downfall of standards committees is what I will call "the perfect model." I can only describe it with an analogy. Let's say that you are designing a car (by committee, of course), and one member of the group is an engineer with a particular passion for motors. In fact, he (yes, so far I've only run into "he's" of this nature) has this dream of the perfect internal combustion engine. Existing engines have made too many compromises -- for efficiency and economy and whatever other corners manufacturers have desired to cut. But now there is the opportunity to create the standard, the standard that everyone will follow and that will make every internal combustion engine the perfect, beautiful engine. The person (let's call him PersonB, reserving PersonA for oneself, or perhaps the chair of the committee, or, depending on the standards body, for the founder of the standards body and inspiration for all things technological) has developed a new four-stroke engine, which he modestly names with an acronym that includes his name. We'll call this the FE (famous engineer) 4x2 engine. The theory of the FE4x2 is as finely honed as the tolerances between the pistons and their housing; it is so perfect in fact that the underlying model can be applied to any - absolutely any - technology in the universe. Because of the near-divine nature of this model, the use of common terminology cannot describe its powers. Perhaps it would be preferable to not name the model and its features at all, leaving it, like Yahweh, to be alluded to but never spoken. However, standards bodies must describe their standards in documents, and even sell these to potential creators of the standard product, so names for the model and its components must be chosen. To inspire in all the importance of the model, terms are chosen to be as devoid of meaning as possible, yet so complex that they produce awe in the reader. Note that confusion is often mistaken for awe by the uneducated.

 Our committee now has described the perfect engine using the universal model, but the standards organization survives on hawking specifications to enterprising souls who will actually create and attempt to sell products that can be certified by the August Authoritative Standards Organization. This means that the thing the standard describes has to be packaged for use. Because the model is perfect, the package surely cannot be mundane. You don't put this engine in something resembling a Sears and Roebuck toaster oven. No, the package must have class, style, and a certain difficulty of use that makes the owner of the final product really think hard about what each knob is for. In fact, it would be ideal if every user would need to attend a series of seminars on the workings of this Perfect Thing. There's a good market for consultants to run these seminars, especially those members of the community who haven't got the skill to actually manufacture the product themselves. Those who can't do, as the saying goes, teach.

The final package needs also to justify the price that will be charged by purveyors of this product. It needs to be complex but classy. It has to waft on the wind of the past while promising an unspecified but surely improved future. The car committee needs to design a chassis that is worthy of the Perfect Engine. Committee members would love for it to be designed around a yet-to-be developed material, one that just screams Tomorrow! Again, though, there is that need to sell the idea to actual manufacturers, so the committee adds to the standard a chassis made of tried-and-true materials that must be tortured into a shape that could be, but probably will not be, what the not-yet-real future technology allows.
"But what about the children?"
Whatever you do, do not be the person on the committee who asks: But what about the driver? How comfortable will it be? Will it be safe? Can children ride in it? (Answer: no, anyone who cares about the Perfect Engine will obviously have the sense to eschew children, who will only distract the adult's attention from the admiration of the Perfect Engine.) And never, ever point out that the design does not include doors for entering the vehicle. It's perfect, okay, just leave it at that. This is how we get a standard, and the industry around a standard; an industry that exists because the standard is so deeply just and true and right that no one can figure out how to use it, yet, because it is a standard from the August Authoritative Standards Organization, the rightity and trueness of the standard simply cannot be questioned. Because it is, after all, a standard, and standards exist to be obeyed.
"I've got mine!"
Another downfall of a standards committee is when the committee has one or more members of the "I've got mine" type. These are folks who already have a product of the genre the standard is meant to address, and their participation in the committee is to assure that their product's design becomes the standard. There are lots of variations on this situation. A committee with only one "I've got mine" becomes a simple test of wills between the have and the have nots. A committee with more than one "I've got mine" becomes a battleground. The have nots on this committee might as well just go home because their views of what is needed are so irrelevant to the process that they can have the same effect on the outcome of the standards work by not being there. Who wins the battle depends on many things, of course, but I'd usually advise that you bet on the largest, richest "I've got mine." It is especially helpful if the "I've got mine" holds patents in the area and can therefore declare (true story) "If you create it, we'll destroy you with with patent claims."

Like the engineer of the perfect model, the "I've got mine" has an idee fixe. In this case, though, the idee may not be perfect or complete or even usable. But it exists, and "I've got mine" does not want to change. Therefore every idea that is not already in the product of "I've got mine" meets with great resistance. At various points in the discussion, "I've got mine" threatens to take his ball and go home. For reasons that have never been clear to me, the committee takes this threat seriously and caves in to "I've got mine" even though most members of the committee actually understand that the committee would be more successful without this person.
"...even though they repeat often the mantra "We can always blow it up and start over" they never, never start over."
This then takes me to downfall number 3: once standards committees dig themselves into a hole, once they have started down a path that is quite clearly not going to result in success, and even though they repeat often the mantra "We can always blow it up and start over" they never, never start over. The standard that comes out always looks like the non-standard that went in on day one, regardless of how dysfunctional and mistaken that is. This is one of the reasons why there are standards on the books that were developed through great effort and whose person hours would add up to hundreds of thousands or even millions of dollars spent and yet they have not been adopted. Common sense allows people outside of the bubble of the standards committee process to admit that the thing just isn't going to work. No way. That's the best possible outcome; the worst possible outcome is that through an excess of obedience in a community with a hive mind the standard is adopted and therefore screws everything up for that community for decades, until a new standards committee is launched.
"...we can have a new standard, but nothing can really change."
If you think that committee will solve the problem, then I suggest you go back to top of this essay and begin reading all over again. Because by now you should be anticipating downfall number 4: we can have a new standard, but nothing can really change. The end result of applying the new standard has to be exactly the same as the result obtained from the old standard. The committee can therefore declare a great success, and everyone can give a sigh of relief that they can go on doing everything the same way they ever did, perhaps with slightly different terminology and a bunch of new acronyms.

Now off I go to read some more emails, asking myself:  "Is this the time to ask: what about the children?"

Friday, October 30, 2015

Libraries, Books, and Elitism

"So is the library, storehouse and lender of books, as anachronistic as the record store, the telephone booth, and the Playboy centerfold? Perversely, the most popular service at some libraries has become free Internet access. People wait in line for terminals that will let them play solitaire and Minecraft, and librarians provide coffee. Other patrons stay in their cars outside just to use the Wi-Fi. No one can be happy with a situation that reduces the library to a Starbucks wannabe."
James Gleick, "What Libraries (Still) Can Do" NYRDaily October 26, 2015

This is one of the worst examples of snobbery and elitism in relation to libraries that I have seen in a long time. It is also horribly uninformed. Let me disassemble this a bit.

First, libraries as places to gather is not new. Libraries in ancient Greece were designed as large open spaces with cubbies for scrolls around the inside wall. Very little of the space was taken up with that era's version of the book. They existed both as storehouses for the written word but also a place where scholars would come together to discuss ideas. Today, when students are asked what they want from their library, one of the highest ranked services is study space. There is nothing wrong with studying in a library; in fact, as anyone with a home office knows, having a physical space where you do your studying and thinking helps one focus the mind and be productive. 

Next, the dismissive and erroneous statement that people use "terminals" (when have you last heard computers called that?) to play solitaire and Minecraft completely ignores that fact that many of our information sources today are available only through online access, including information sources available to most users only through the library. If you want to look up journal articles you need the library's online access. Second, many social services are available online. The US government and most state governments no longer provide libraries with hard copies of documents, but make them available online. From IRS tax preparation help to information about state law and city zoning ordinances, you absolutely must have Internet access. Internet access is no longer optional for civic life. I can't imagine that anyone is waiting in line at a library for a one-hour slot to build their Minecraft world, but if they are, then I'm fine with that. It's no less "library-like" than using the library to read People magazine or check out a romance novel. (Gleick is probably against those, too.)

Gleick doesn't seem to know (and perhaps Palfrey, whose book he is reviewing, ditto) that libraries have limits on ebook lending.
And a library that could lend any e-book, without restriction, en masse, would be the perfect fatal competitor to bookstores and authors hoping to sell e-books for money. Something will have to give. Palfrey suggests that Congress could create “a compulsory license system to cover digital lending through libraries,” allowing for payment of fair royalties to the authors. Many countries, including most of Europe, have public lending right programs for this purpose.
This completely misses the point. Libraries already lend e-books, with restriction, and they pay for them in the same way that they pay for paper books -- by paying for each copy that they lend. Suggesting a compulsory license is not a solution, and the public lending right that is common in Europe is for hard copy books as well as e-books. The difference being that the payment for lending in those countries does not come out of library budgets but is often paid out of a central fund supporting the arts. Given that the US has a very low level of government funding for the arts, and that libraries are not funded through a single government mechanism, a public lending payment would be extremely difficult to develop in this country.  There is the very real risk that it would take money out of already stretched library budgets and would  further disadvantage those library systems that are struggling the most to overcome poor local funding.

I don't at all mind folks having an opinion about libraries, about what they like and what they want. But I would hope that a researcher like Gleick would do at least as much research about libraries as he does about other subjects he expounds on. They - we - deserve the same attention to truth.

Tuesday, October 13, 2015

SHACL - Shapes Constraint Language

If you've delved into RDF or other technologies of the Semantic Web you may have found yourself baffled at times by its tendency to produce data that is open to interpretation. This is, of course, a feature not a bug. RDF has as the basis of its design something called the "Open World Assumption". The OWA acts more like real life than controlled data stores because it allows the answers to many questions to be neither true nor false, but "we may not have all of the information." This makes it very hard to do the kind of data control and validity checking of data that is the norm in databases and in data exchange.

There is an obvious need in some situations to exercise constraints on the data that one manages in RDF. This is particularly true within local systems where data is created and updated, and when exchanging data with known partners. To fill this gap, the semantic web branch of the World Wide Web Consortium has been working on a new standard, called the SHApes Constraint Language (SHACL), that will perform for RDF the function that XML schema performs for XML: it will allow software developers to define validity rules for a particular set of RDF.

SHACL has been in development for nearly a year, and is just now available in a First Public Working Draft. A FPWD is not by any means a finished product, but is far enough along to give readers an idea of the direction that the standard is taking. It is made available because comment from a larger community is extremely important. The introduction to the draft tells you where to send your comments. (Note: I serve on the working group representing the Dublin Core community, so I will do my best to make sure your comments get full consideration.)

Like many standards, SHACL is not easy to understand. However, I think it will be important for members of the library and other cultural heritage communities to make an effort to weigh in on this standard. Support for SHACL is strong from the "enterprise" sector, people who primarily work on highly controlled closed systems like banks and other information intense businesses. How SHACL benefits those whose data is designed for the open web may depend on us.

SHACL Basics

The key to understanding SHACL is that SHACL is based in large part on SPARQL because SPARQL already has formally defined mechanisms that function on RDF graphs. There will be little if any SHACL functionality that could not be done with SPARQL. SPARQL queries that perform some of these functions are devilishly difficult to write so SHACL should provide a cleaner, more constraint-based language.

SHACL consists of a core of constraints that belong to the SHACL language and have SHACL-defined properties. These should be sufficient for most validation needs. SHACL also has a template mechanism that makes it possible for anyone to create a templated constraint to meet additional needs.

What does SHACL look like? It's RDF, so it looks like RDF. Here's a SHACL statement that covers the case "either one foaf:name OR (one foaf:forename AND one foaf:lastname):

    a sh:Shape ;
sh:scopeClass foaf:Person ;
    sh:constraint [
        a sh:OrConstraint ;
                sh:property [
                    sh:predicate foaf:name ;
                    sh:minCount 1 ;
                    sh:maxCount 1 ;
                sh:property [
                    sh:predicate foaf:forename  ;
                    sh:minCount 1 ;
                    sh:maxCount 1 ;
                ] ;
                sh:property [
                    sh:predicate foaf:lastname  ;
                    sh:minCount 1 ;
                    sh:maxCount 1 ;
    ] .

SHACL constraints can either be open or closed. Open, the default, constrains the named properties but ignores other properties in the same RDF graph. Closed, it essentially means "these properties and only these properties; everything else is a violation."

There are comparisons, such as "equal/not equal" that act on pairs of properties. There are also constraints on values such as defined value types (IRI, data type), lists of valid values, and pattern matching.

The question that needs to be answered around this draft is whether SHACL, as currently defined, meets our needs -- or at least, most of them. One way to address this would be to gather some typical and some atypical validation tests that are needed for library and archive data, and try to express those in SHACL. I have a few examples (mainly from Europeana data), but definitely need more. You can add them to the comments here, send them to me (or send a link to documentation that outlines your data rules), or post them directly to the working group list if you have specific questions.

Thanks in advance.

Tuesday, September 22, 2015

FRBR Before and After - Afterword

Below is a preview of the Afterword of my book, FRBR, Before and After. I had typed the title of the section as "Afterward" (caught by the copy editor, of course), and yet as I think about it, that wasn't really an inappropriate misspelling, because what really matters now is what comes after - after we think hard about what our goals are and how we could achieve them. In any case, here's a preview of that "afterward" from the book.


There is no question that FRBR represents a great leap forward in the theory of bibliographic description. It addresses the “work question” that so troubled some of the great minds of library cataloging in the twentieth century. It provides a view of the “bibliographic family” through its recognition of the importance of the relationships that exist between created cultural objects. It has already resulted in vocabularies that make it possible to discuss the complex nature of the resources that libraries and archives gather and manage.

As a conceptual model, FRBR has informed a new era of library cataloging rules. It has been integrated into the cataloging workflow to a certain extent. FRBR has also inspired some non-library efforts, and those have given us interesting insight into the potential of the conceptual model to support a variety of different needs.

The FRBR model, with its emphasis on bibliographic relationships, has the potential to restore context that was once managed through alphabetical collocation to the catalog. In fact, the use of a Semantic Web technology with a model of entities and relations could be a substantial improvement in this area, because the context that brings bibliographic units together can be made explicit: “translation of,” “film adaptation of,” “commentary on.” This, of course, could be achieved with or without FRBR, but because the conceptual model articulates the relationships, and the relationships are included in the recent cataloging rules, it makes sense to begin with FRBR and evolve from there.

However, the gap between the goals developed at the Stockholm meeting in 1991 and the result of the FRBR Study Group’s analysis is striking. FRBR defined only a small set of functional requirements, at a very broad level: find, identify, select, and obtain. The study would have been more convincing as a functional analysis if those four tasks had been further analyzed and had been the focus of the primary content of the study report. Instead, from my reading of the FRBR Final Report, it appears that the entity-relation analysis of bibliographic data took precedence over user tasks in the work of the FRBR Study Group.

The report’s emphasis on the entity-relation model, and the inclusion of three simple diagrams in the report, is mostly likely the reason for the widespread belief that the FRBR Final Report defines a technology standard for bibliographic data. Although technology solutions can and have been developed around the FRBR conceptual model, no technology solution is presented in the FRBR Final Report. Even more importantly, there is nothing in the FRBR Final Report to suggest that there is one, and only one, technology possible based on the FRBR concepts. This is borne out by the examples we have of FRBR-based data models, each of which interprets the FRBR concepts to serve their particular set of needs. The strength of FRBR as a conceptual model is that it can support a variety of interpretations. FRBR can be a useful model for future developments, but it is a starting point, not a finalized product.

There is, of course, a need for technology standards that can be used to convey information about bibliographic resources. I say “standards” in the plural, because it is undeniable that the characteristics of libraries and their users have such a wide range of functions and needs that no one solution could possibly serve all. Well-designed standards create a minimum level of compliance that allows interoperability while permitting necessary variation to take place. A good example of this is the light bulb: with a defined standard base for the light bulb we have been able to move from incandescent to fluorescent and now to LED bulbs, all the time keeping our same lighting fixtures. 
We must do the same for bibliographic data so that we can address the need for variation in the different approaches between books and non-books, and between the requirements of the library catalog versus the use of bibliographic data in a commercial model or in a publication workflow.

Standardization on a single over-arching bibliographic model is not a reasonable solution. Instead, we should ask: what are the minimum necessary points of compliance that will make interoperability possible between these various uses and users? Interoperability needs to take place around the information and meaning carried in the bibliographic description, not in the structure that carries the data. What must be allowed to vary in our case is the technology that carries that message, because it is the rapid rate of technology change that we must be able to adjust to in the least disruptive way possible. The value of a strong conceptual model is that it is not dependent on any single technology.

It is now nearly twenty years since the Final Report of the FRBR Study Group was published. The FRBR concept has been expanded to include related standards for subjects and for persons, corporate bodies, and families. There is an ongoing Working Group for Functional Requirements for Bibliographic Records that is part of the Cataloguing Section of the International Federation of Library Associations. It is taken for granted by many that future library systems will carry data organized around the FRBR groups of entities. I hope that the analysis that I have provided here encourages critical thinking about some of our assumptions, and fosters the kind of dialog that is needed for us to move fruitfully from broad concepts to an integrative approach for bibliographic data.

From FRBR, Before and After, by Karen Coyle. Published by ALA Editions, 2015

©Karen Coyle, 2015 Creative Commons License
FRBR, Before and After by Karen Coyle is licensed under a Creative Commons Attribution 4.0 International License.

Sunday, September 13, 2015

Models of our World

This is to announce the publication of my book, FRBR, Before and After, by ALA Editions, available in November, 2015. As is often the case, the title doesn't tell the story, so I want to give a bit of an introduction before everyone goes: "Oh, another book on FRBR, yeeech." To be honest, the book does have quite a bit about FRBR, but it's also a think piece about bibliographic models, and a book entitled "Bibliographic Models" would look even more boring than one called "FRBR, Before and After."

The before part is a look at the evolution of the concept of Work, and, yes, Panizzi and Cutter are included, as are Lubetzky, Wilson, and others. Then I look at modeling and how goals and models are connected, and the effect that technology has (and has not) had on library data. The second part of the book focuses on the change that FRBR has wrought both in our thinking and in how we model the bibliographic description. I'll post more about that in the near future, but let me just say that you might be surprised at what you read there.

The text will also be available as open access in early 2016. This is thanks to the generosity of ALA Editions, who agreed to this model. I do hope that enough libraries and individuals do decide to purchase the hard copy that ALA Publishing puts out so that this model of print plus OA is economically viable. I can attest to the fact that the editorial work and application of design to the book has produced a final version that I could not have even approximated on my own

Monday, August 10, 2015

Google becomes Alphabet

I thought it was a joke, especially when the article said that they have two investment companies, Ventures and Capital. But it's all true, so I have this to say:

G is for Google, H is for cHutzpah. In addition to our investment companies Ventures and Capital, we are instituting a think tank, Brain, and a company focused on carbon-based life-based forms, Body. Servicing these will be three key enterprises: Food, Water, and Air. Support will be provided by Planet, a subsidiary of Universe. Of course, we'll also need to provide Light. Let there be. Singularity. G is for God. 

Friday, July 17, 2015

Flexibility in bibliographic models

A motley crew of folks had a chat via Google Hangout earlier this week to talk about FRBR and Fedora. I know exactly squat about Fedora, but I've just spent 18 months studying FRBR and other bibliographic models, so I joined the discussion. We came to a kind of nodding agreement, that I will try to express here, but one that requires us to do some hard work if we are to make it something we can work with.

The primary conclusion was that the models of FRBR and BIBFRAME, with their separation of bibliographic information into distinct entities, are too inflexible for general use. There are simply too many situations in which either the nature of the materials or the available metadata simply does not fit into the entity boundaries defined in those models. This is not news -- since the publication of FRBR in 1998 there are have numerous articles pointing out the need for modifications of FRBR for different materials (music, archival materials, serials, and others). The report of the audio-visual community to BIBFRAME said the same. Similar criticisms have been aimed at recent generations of cataloging rules, whose goal is to provide uniformity in bibliographic description across all media types. The differences in treatment that are needed by the various communities are not mutually compatible, which means that a single model is not going to work over the vast landscape that is "cultural heritage materials."

At the same time, folks in this week's informal discussion were able to readily cite use cases in which they would want to identify a group of metadata statements that would define a particular aspect of the data, such as a work or an item. The trick, therefore, is to find a sweet spot between the need for useful semantics and the need for flexibility within the heterogeneous cultural heritage collections that could benefit from sharing and linking their data amongst them.

One immediate thought is: let's define a core! (OK, it's been done, but maybe that's a different core.) The problem with this idea is that there are NO descriptive elements that will be useful for all materials. Title? (seems obvious) -- but there are many materials in museums and archives that have no title, from untitled art works, to museum pieces ("Greek vase",) to materials in archives ("Letter from John to Mary"). Although these are often given names of a sort, none have titles that function to identify them in any meaningful way. Creators? From anonymous writings to those Greek vases, not to mention the dinosaur bones and geodes in a science museum, many things don't have identifiable creators. Subjects? Well, if you mean this to be "topic" then again, not everything has a topic; think "abstract art" and again those geodes. Most things have a genre or a type but standardizing on those alone would hardly reap great benefits in data sharing.

The upshot, at least the conclusion that I reach, is that there are no universals. At best there is some overlap between (A & B) and then between (B & C), etc. What the informal group that met this week concluded is that there is some value in standardizing among like data types, simply to make the job of developers easier. The main requirement overall, though, is to have a standard way to share ones metadata choices, not unlike an XML schema, but for the RDF world. Something that others can refer to or, even better, use directly in processing data you provide.

Note that none of the above means throwing over FRBR, BIBFRAME, or RDA entirely. Each has defined some data elements that will be useful, and it is always better to re-use than to re-invent. But the attempts to use these vocabularies to fix a single view of bibliographic data is simply not going to work in a world as varied as the one we live in. We limit ourselves greatly if we reject data that does not conform to a single definition rather than making use of connections between close but not identical data communities.

There's no solution being offered at this time, but identifying the target is a good first step.