Coyle's InFormation: 2015

Tuesday, November 03, 2015

The Standards Committee Hell

I haven't been on a lot of standards committees, but each one has defined a major era in my life. I have spent countless hours in standards committees. That's because a standards committee requires hundreds of hours of reading emails, discussing minutiae (sometimes the meaning of "*", other times the placement of commas). The one universal in standards creation is that nearly everyone comes to the work with a preconceived idea of what the outcome should be, long before hearing (but not listening to) the brilliant and necessary ideas of fellow members of the committee. Most of these standards-progressing people are so sure that their sky is the truest blue that they hardly recognize the need to give passing attention to what others have to say.

In one committee I was on, the alpha geek appeared the first day with a 30-page document in hand, put it on the table, and said: "There. It's done. We can all go home now." He was smiling, but it wasn't a "ha ha" smile, it was a "gotcha" smile. That committee lasted over two years, two long, painful years in which we never quite climbed out of the chasm that we were thrown into on that first day. Over that two-year period we chipped away at the original document, transformed a few of its more arcane paragraphs into something almost readable, and eventually presented the world with a one hundred page document that was even worse than what we had started with. Thus is the way of standards.

"...it is so perfect in fact that the underlying model can be applied to any - absolutely any - technology in the universe."

A particular downfall of standards committees is what I will call "the perfect model." I can only describe it with an analogy. Let's say that you are designing a car (by committee, of course), and one member of the group is an engineer with a particular passion for motors. In fact, he (yes, so far I've only run into "he's" of this nature) has this dream of the perfect internal combustion engine. Existing engines have made too many compromises -- for efficiency and economy and whatever other corners manufacturers have desired to cut. But now there is the opportunity to create the standard, the standard that everyone will follow and that will make every internal combustion engine the perfect, beautiful engine. The person (let's call him PersonB, reserving PersonA for oneself, or perhaps the chair of the committee, or, depending on the standards body, for the founder of the standards body and inspiration for all things technological) has developed a new four-stroke engine, which he modestly names with an acronym that includes his name. We'll call this the FE (famous engineer) 4x2 engine. The theory of the FE4x2 is as finely honed as the tolerances between the pistons and their housing; it is so perfect in fact that the underlying model can be applied to any - absolutely any - technology in the universe. Because of the near-divine nature of this model, the use of common terminology cannot describe its powers. Perhaps it would be preferable to not name the model and its features at all, leaving it, like Yahweh, to be alluded to but never spoken. However, standards bodies must describe their standards in documents, and even sell these to potential creators of the standard product, so names for the model and its components must be chosen. To inspire in all the importance of the model, terms are chosen to be as devoid of meaning as possible, yet so complex that they produce awe in the reader. Note that confusion is often mistaken for awe by the uneducated.

Our committee now has described the perfect engine using the universal model, but the standards organization survives on hawking specifications to enterprising souls who will actually create and attempt to sell products that can be certified by the August Authoritative Standards Organization. This means that the thing the standard describes has to be packaged for use. Because the model is perfect, the package surely cannot be mundane. You don't put this engine in something resembling a Sears and Roebuck toaster oven. No, the package must have class, style, and a certain difficulty of use that makes the owner of the final product really think hard about what each knob is for. In fact, it would be ideal if every user would need to attend a series of seminars on the workings of this Perfect Thing. There's a good market for consultants to run these seminars, especially those members of the community who haven't got the skill to actually manufacture the product themselves. Those who can't do, as the saying goes, teach.

The final package needs also to justify the price that will be charged by purveyors of this product. It needs to be complex but classy. It has to waft on the wind of the past while promising an unspecified but surely improved future. The car committee needs to design a chassis that is worthy of the Perfect Engine. Committee members would love for it to be designed around a yet-to-be developed material, one that just screams Tomorrow! Again, though, there is that need to sell the idea to actual manufacturers, so the committee adds to the standard a chassis made of tried-and-true materials that must be tortured into a shape that could be, but probably will not be, what the not-yet-real future technology allows.

"But what about the children?"

Whatever you do, do not be the person on the committee who asks: But what about the driver? How comfortable will it be? Will it be safe? Can children ride in it? (Answer: no, anyone who cares about the Perfect Engine will obviously have the sense to eschew children, who will only distract the adult's attention from the admiration of the Perfect Engine.) And never, ever point out that the design does not include doors for entering the vehicle. It's perfect, okay, just leave it at that. This is how we get a standard, and the industry around a standard; an industry that exists because the standard is so deeply just and true and right that no one can figure out how to use it, yet, because it is a standard from the August Authoritative Standards Organization, the rightity and trueness of the standard simply cannot be questioned. Because it is, after all, a standard, and standards exist to be obeyed.

"I've got mine!"

Another downfall of a standards committee is when the committee has one or more members of the "I've got mine" type. These are folks who already have a product of the genre the standard is meant to address, and their participation in the committee is to assure that their product's design becomes the standard. There are lots of variations on this situation. A committee with only one "I've got mine" becomes a simple test of wills between the have and the have nots. A committee with more than one "I've got mine" becomes a battleground. The have nots on this committee might as well just go home because their views of what is needed are so irrelevant to the process that they can have the same effect on the outcome of the standards work by not being there. Who wins the battle depends on many things, of course, but I'd usually advise that you bet on the largest, richest "I've got mine." It is especially helpful if the "I've got mine" holds patents in the area and can therefore declare (true story) "If you create it, we'll destroy you with with patent claims."

Like the engineer of the perfect model, the "I've got mine" has an idee fixe. In this case, though, the idee may not be perfect or complete or even usable. But it exists, and "I've got mine" does not want to change. Therefore every idea that is not already in the product of "I've got mine" meets with great resistance. At various points in the discussion, "I've got mine" threatens to take his ball and go home. For reasons that have never been clear to me, the committee takes this threat seriously and caves in to "I've got mine" even though most members of the committee actually understand that the committee would be more successful without this person.

"...even though they repeat often the mantra "We can always blow it up and start over" they never, never start over."

This then takes me to downfall number 3: once standards committees dig themselves into a hole, once they have started down a path that is quite clearly not going to result in success, and even though they repeat often the mantra "We can always blow it up and start over" they never, never start over. The standard that comes out always looks like the non-standard that went in on day one, regardless of how dysfunctional and mistaken that is. This is one of the reasons why there are standards on the books that were developed through great effort and whose person hours would add up to hundreds of thousands or even millions of dollars spent and yet they have not been adopted. Common sense allows people outside of the bubble of the standards committee process to admit that the thing just isn't going to work. No way. That's the best possible outcome; the worst possible outcome is that through an excess of obedience in a community with a hive mind the standard is adopted and therefore screws everything up for that community for decades, until a new standards committee is launched.

"...we can have a new standard, but nothing can really change."

If you think that committee will solve the problem, then I suggest you go back to top of this essay and begin reading all over again. Because by now you should be anticipating downfall number 4: we can have a new standard, but nothing can really change. The end result of applying the new standard has to be exactly the same as the result obtained from the old standard. The committee can therefore declare a great success, and everyone can give a sigh of relief that they can go on doing everything the same way they ever did, perhaps with slightly different terminology and a bunch of new acronyms.

Now off I go to read some more emails, asking myself: "Is this the time to ask: what about the children?"

Friday, October 30, 2015

Libraries, Books, and Elitism

"So is the library, storehouse and lender of books, as anachronistic as the record store, the telephone booth, and the Playboy centerfold? Perversely, the most popular service at some libraries has become free Internet access. People wait in line for terminals that will let them play solitaire and Minecraft, and librarians provide coffee. Other patrons stay in their cars outside just to use the Wi-Fi. No one can be happy with a situation that reduces the library to a Starbucks wannabe."

James Gleick, "What Libraries (Still) Can Do" NYRDaily October 26, 2015

This is one of the worst examples of snobbery and elitism in relation to libraries that I have seen in a long time. It is also horribly uninformed. Let me disassemble this a bit.

First, libraries as places to gather is not new. Libraries in ancient Greece were designed as large open spaces with cubbies for scrolls around the inside wall. Very little of the space was taken up with that era's version of the book. They existed both as storehouses for the written word but also a place where scholars would come together to discuss ideas. Today, when students are asked what they want from their library, one of the highest ranked services is study space. There is nothing wrong with studying in a library; in fact, as anyone with a home office knows, having a physical space where you do your studying and thinking helps one focus the mind and be productive.

Next, the dismissive and erroneous statement that people use "terminals" (when have you last heard computers called that?) to play solitaire and Minecraft completely ignores that fact that many of our information sources today are available only through online access, including information sources available to most users only through the library. If you want to look up journal articles you need the library's online access. Second, many social services are available online. The US government and most state governments no longer provide libraries with hard copies of documents, but make them available online. From IRS tax preparation help to information about state law and city zoning ordinances, you absolutely must have Internet access. Internet access is no longer optional for civic life. I can't imagine that anyone is waiting in line at a library for a one-hour slot to build their Minecraft world, but if they are, then I'm fine with that. It's no less "library-like" than using the library to read People magazine or check out a romance novel. (Gleick is probably against those, too.)

Gleick doesn't seem to know (and perhaps Palfrey, whose book he is reviewing, ditto) that libraries have limits on ebook lending.

And a library that could lend any e-book, without restriction, en masse, would be the perfect fatal competitor to bookstores and authors hoping to sell e-books for money. Something will have to give. Palfrey suggests that Congress could create “a compulsory license system to cover digital lending through libraries,” allowing for payment of fair royalties to the authors. Many countries, including most of Europe, have public lending right programs for this purpose.

This completely misses the point. Libraries already lend e-books, with restriction, and they pay for them in the same way that they pay for paper books -- by paying for each copy that they lend. Suggesting a compulsory license is not a solution, and the public lending right that is common in Europe is for hard copy books as well as e-books. The difference being that the payment for lending in those countries does not come out of library budgets but is often paid out of a central fund supporting the arts. Given that the US has a very low level of government funding for the arts, and that libraries are not funded through a single government mechanism, a public lending payment would be extremely difficult to develop in this country. There is the very real risk that it would take money out of already stretched library budgets and would further disadvantage those library systems that are struggling the most to overcome poor local funding.

I don't at all mind folks having an opinion about libraries, about what they like and what they want. But I would hope that a researcher like Gleick would do at least as much research about libraries as he does about other subjects he expounds on. They - we - deserve the same attention to truth.

Tuesday, October 13, 2015

SHACL - Shapes Constraint Language

If you've delved into RDF or other technologies of the Semantic Web you may have found yourself baffled at times by its tendency to produce data that is open to interpretation. This is, of course, a feature not a bug. RDF has as the basis of its design something called the "Open World Assumption". The OWA acts more like real life than controlled data stores because it allows the answers to many questions to be neither true nor false, but "we may not have all of the information." This makes it very hard to do the kind of data control and validity checking of data that is the norm in databases and in data exchange.

There is an obvious need in some situations to exercise constraints on the data that one manages in RDF. This is particularly true within local systems where data is created and updated, and when exchanging data with known partners. To fill this gap, the semantic web branch of the World Wide Web Consortium has been working on a new standard, called the SHApes Constraint Language (SHACL), that will perform for RDF the function that XML schema performs for XML: it will allow software developers to define validity rules for a particular set of RDF.

SHACL has been in development for nearly a year, and is just now available in a First Public Working Draft. A FPWD is not by any means a finished product, but is far enough along to give readers an idea of the direction that the standard is taking. It is made available because comment from a larger community is extremely important. The introduction to the draft tells you where to send your comments. (Note: I serve on the working group representing the Dublin Core community, so I will do my best to make sure your comments get full consideration.)

Like many standards, SHACL is not easy to understand. However, I think it will be important for members of the library and other cultural heritage communities to make an effort to weigh in on this standard. Support for SHACL is strong from the "enterprise" sector, people who primarily work on highly controlled closed systems like banks and other information intense businesses. How SHACL benefits those whose data is designed for the open web may depend on us.

SHACL Basics

The key to understanding SHACL is that SHACL is based in large part on SPARQL because SPARQL already has formally defined mechanisms that function on RDF graphs. There will be little if any SHACL functionality that could not be done with SPARQL. SPARQL queries that perform some of these functions are devilishly difficult to write so SHACL should provide a cleaner, more constraint-based language.

SHACL consists of a core of constraints that belong to the SHACL language and have SHACL-defined properties. These should be sufficient for most validation needs. SHACL also has a template mechanism that makes it possible for anyone to create a templated constraint to meet additional needs.

What does SHACL look like? It's RDF, so it looks like RDF. Here's a SHACL statement that covers the case "either one foaf:name OR (one foaf:forename AND one foaf:lastname):

ex:myPersonShape
a sh:Shape ;
sh:scopeClass foaf:Person ;
sh:constraint [
a sh:OrConstraint ;
sh:shapes(
[
sh:property [
sh:predicate foaf:name ;
sh:minCount 1 ;
sh:maxCount 1 ;
]
]
[
sh:property [
sh:predicate foaf:forename ;
sh:minCount 1 ;
sh:maxCount 1 ;
] ;
sh:property [
sh:predicate foaf:lastname ;
sh:minCount 1 ;
sh:maxCount 1 ;
]
]
)
] .

SHACL constraints can either be open or closed. Open, the default, constrains the named properties but ignores other properties in the same RDF graph. Closed, it essentially means "these properties and only these properties; everything else is a violation."

There are comparisons, such as "equal/not equal" that act on pairs of properties. There are also constraints on values such as defined value types (IRI, data type), lists of valid values, and pattern matching.

The question that needs to be answered around this draft is whether SHACL, as currently defined, meets our needs -- or at least, most of them. One way to address this would be to gather some typical and some atypical validation tests that are needed for library and archive data, and try to express those in SHACL. I have a few examples (mainly from Europeana data), but definitely need more. You can add them to the comments here, send them to me (or send a link to documentation that outlines your data rules), or post them directly to the working group list if you have specific questions.

Thanks in advance.

Tuesday, September 22, 2015

FRBR Before and After - Afterword

Below is a preview of the Afterword of my book, FRBR, Before and After. I had typed the title of the section as "Afterward" (caught by the copy editor, of course), and yet as I think about it, that wasn't really an inappropriate misspelling, because what really matters now is what comes after - after we think hard about what our goals are and how we could achieve them. In any case, here's a preview of that "afterward" from the book.

Afterword

There is no question that FRBR represents a great leap forward in the theory of bibliographic description. It addresses the “work question” that so troubled some of the great minds of library cataloging in the twentieth century. It provides a view of the “bibliographic family” through its recognition of the importance of the relationships that exist between created cultural objects. It has already resulted in vocabularies that make it possible to discuss the complex nature of the resources that libraries and archives gather and manage.

As a conceptual model, FRBR has informed a new era of library cataloging rules. It has been integrated into the cataloging workflow to a certain extent. FRBR has also inspired some non-library efforts, and those have given us interesting insight into the potential of the conceptual model to support a variety of different needs.

The FRBR model, with its emphasis on bibliographic relationships, has the potential to restore context that was once managed through alphabetical collocation to the catalog. In fact, the use of a Semantic Web technology with a model of entities and relations could be a substantial improvement in this area, because the context that brings bibliographic units together can be made explicit: “translation of,” “film adaptation of,” “commentary on.” This, of course, could be achieved with or without FRBR, but because the conceptual model articulates the relationships, and the relationships are included in the recent cataloging rules, it makes sense to begin with FRBR and evolve from there.

However, the gap between the goals developed at the Stockholm meeting in 1991 and the result of the FRBR Study Group’s analysis is striking. FRBR defined only a small set of functional requirements, at a very broad level: find, identify, select, and obtain. The study would have been more convincing as a functional analysis if those four tasks had been further analyzed and had been the focus of the primary content of the study report. Instead, from my reading of the FRBR Final Report, it appears that the entity-relation analysis of bibliographic data took precedence over user tasks in the work of the FRBR Study Group.

The report’s emphasis on the entity-relation model, and the inclusion of three simple diagrams in the report, is mostly likely the reason for the widespread belief that the FRBR Final Report defines a technology standard for bibliographic data. Although technology solutions can and have been developed around the FRBR conceptual model, no technology solution is presented in the FRBR Final Report. Even more importantly, there is nothing in the FRBR Final Report to suggest that there is one, and only one, technology possible based on the FRBR concepts. This is borne out by the examples we have of FRBR-based data models, each of which interprets the FRBR concepts to serve their particular set of needs. The strength of FRBR as a conceptual model is that it can support a variety of interpretations. FRBR can be a useful model for future developments, but it is a starting point, not a finalized product.

There is, of course, a need for technology standards that can be used to convey information about bibliographic resources. I say “standards” in the plural, because it is undeniable that the characteristics of libraries and their users have such a wide range of functions and needs that no one solution could possibly serve all. Well-designed standards create a minimum level of compliance that allows interoperability while permitting necessary variation to take place. A good example of this is the light bulb: with a defined standard base for the light bulb we have been able to move from incandescent to fluorescent and now to LED bulbs, all the time keeping our same lighting fixtures.

We must do the same for bibliographic data so that we can address the need for variation in the different approaches between books and non-books, and between the requirements of the library catalog versus the use of bibliographic data in a commercial model or in a publication workflow.

Standardization on a single over-arching bibliographic model is not a reasonable solution. Instead, we should ask: what are the minimum necessary points of compliance that will make interoperability possible between these various uses and users? Interoperability needs to take place around the information and meaning carried in the bibliographic description, not in the structure that carries the data. What must be allowed to vary in our case is the technology that carries that message, because it is the rapid rate of technology change that we must be able to adjust to in the least disruptive way possible. The value of a strong conceptual model is that it is not dependent on any single technology.

It is now nearly twenty years since the Final Report of the FRBR Study Group was published. The FRBR concept has been expanded to include related standards for subjects and for persons, corporate bodies, and families. There is an ongoing Working Group for Functional Requirements for Bibliographic Records that is part of the Cataloguing Section of the International Federation of Library Associations. It is taken for granted by many that future library systems will carry data organized around the FRBR groups of entities. I hope that the analysis that I have provided here encourages critical thinking about some of our assumptions, and fosters the kind of dialog that is needed for us to move fruitfully from broad concepts to an integrative approach for bibliographic data.

From FRBR, Before and After, by Karen Coyle. Published by ALA Editions, 2015

©Karen Coyle, 2015

FRBR, Before and After by Karen Coyle is licensed under a Creative Commons Attribution 4.0 International License.

Sunday, September 13, 2015

Models of our World

This is to announce the publication of my book, FRBR, Before and After, by ALA Editions, available in November, 2015. As is often the case, the title doesn't tell the story, so I want to give a bit of an introduction before everyone goes: "Oh, another book on FRBR, yeeech." To be honest, the book does have quite a bit about FRBR, but it's also a think piece about bibliographic models, and a book entitled "Bibliographic Models" would look even more boring than one called "FRBR, Before and After."

The before part is a look at the evolution of the concept of Work, and, yes, Panizzi and Cutter are included, as are Lubetzky, Wilson, and others. Then I look at modeling and how goals and models are connected, and the effect that technology has (and has not) had on library data. The second part of the book focuses on the change that FRBR has wrought both in our thinking and in how we model the bibliographic description. I'll post more about that in the near future, but let me just say that you might be surprised at what you read there.

The text will also be available as open access in early 2016. This is thanks to the generosity of ALA Editions, who agreed to this model. I do hope that enough libraries and individuals do decide to purchase the hard copy that ALA Publishing puts out so that this model of print plus OA is economically viable. I can attest to the fact that the editorial work and application of design to the book has produced a final version that I could not have even approximated on my own

Monday, August 10, 2015

Google becomes Alphabet

I thought it was a joke, especially when the article said that they have two investment companies, Ventures and Capital. But it's all true, so I have this to say:

G is for Google, H is for cHutzpah. In addition to our investment companies Ventures and Capital, we are instituting a think tank, Brain, and a company focused on carbon-based life-based forms, Body. Servicing these will be three key enterprises: Food, Water, and Air. Support will be provided by Planet, a subsidiary of Universe. Of course, we'll also need to provide Light. Let there be. Singularity. G is for God.

Friday, July 17, 2015

Flexibility in bibliographic models

A motley crew of folks had a chat via Google Hangout earlier this week to talk about FRBR and Fedora. I know exactly squat about Fedora, but I've just spent 18 months studying FRBR and other bibliographic models, so I joined the discussion. We came to a kind of nodding agreement, that I will try to express here, but one that requires us to do some hard work if we are to make it something we can work with.

The primary conclusion was that the models of FRBR and BIBFRAME, with their separation of bibliographic information into distinct entities, are too inflexible for general use. There are simply too many situations in which either the nature of the materials or the available metadata simply does not fit into the entity boundaries defined in those models. This is not news -- since the publication of FRBR in 1998 there are have numerous articles pointing out the need for modifications of FRBR for different materials (music, archival materials, serials, and others). The report of the audio-visual community to BIBFRAME said the same. Similar criticisms have been aimed at recent generations of cataloging rules, whose goal is to provide uniformity in bibliographic description across all media types. The differences in treatment that are needed by the various communities are not mutually compatible, which means that a single model is not going to work over the vast landscape that is "cultural heritage materials."

At the same time, folks in this week's informal discussion were able to readily cite use cases in which they would want to identify a group of metadata statements that would define a particular aspect of the data, such as a work or an item. The trick, therefore, is to find a sweet spot between the need for useful semantics and the need for flexibility within the heterogeneous cultural heritage collections that could benefit from sharing and linking their data amongst them.

One immediate thought is: let's define a core! (OK, it's been done, but maybe that's a different core.) The problem with this idea is that there are NO descriptive elements that will be useful for all materials. Title? (seems obvious) -- but there are many materials in museums and archives that have no title, from untitled art works, to museum pieces ("Greek vase",) to materials in archives ("Letter from John to Mary"). Although these are often given names of a sort, none have titles that function to identify them in any meaningful way. Creators? From anonymous writings to those Greek vases, not to mention the dinosaur bones and geodes in a science museum, many things don't have identifiable creators. Subjects? Well, if you mean this to be "topic" then again, not everything has a topic; think "abstract art" and again those geodes. Most things have a genre or a type but standardizing on those alone would hardly reap great benefits in data sharing.

The upshot, at least the conclusion that I reach, is that there are no universals. At best there is some overlap between (A & B) and then between (B & C), etc. What the informal group that met this week concluded is that there is some value in standardizing among like data types, simply to make the job of developers easier. The main requirement overall, though, is to have a standard way to share ones metadata choices, not unlike an XML schema, but for the RDF world. Something that others can refer to or, even better, use directly in processing data you provide.

Note that none of the above means throwing over FRBR, BIBFRAME, or RDA entirely. Each has defined some data elements that will be useful, and it is always better to re-use than to re-invent. But the attempts to use these vocabularies to fix a single view of bibliographic data is simply not going to work in a world as varied as the one we live in. We limit ourselves greatly if we reject data that does not conform to a single definition rather than making use of connections between close but not identical data communities.

There's no solution being offered at this time, but identifying the target is a good first step.

Thursday, May 28, 2015

International Cataloguing Principles, 2015

IFLA is revising the International Cataloguing Principles and asked for input. Although I doubt that it will have an effect, I did write up my comments and send them in. Here's my view of the principles, including their history.

The original ICP dates from 1961 and read like a very condensed set of cataloging rules. [Note: As T Berger points out, this document was entitled "Paris Principles", not ICP.] It was limited to choice and form of entries (personal and corporate authors, titles). It also stated clearly that it applied to alphabetically sequenced catalogs:

The principles here stated apply only to the choice and form of headings and entry words -- i.e. to the principal elements determining the order of entries -- in catalogues of printed books in which entries under authors' names and, where these are inappropriate or insufficient, under the titles of works are combined in one alphabetical sequence.

The basic statement of principles was not particularly different from those stated by Charles Ammi Cutter in 1875.

Cutter

ICP 1961

Note that the ICP does not include subject access, which was included in Cutter's objectives for the catalog. Somewhere between 1875 and 1961, cataloging became descriptive cataloging only. Cutter's rules did include a fair amount detail about subject cataloging (in 13 pages, as compared to 23 pages on authors).

The next version of the principles was issued in 2009. This version is intended to be "applicable to online catalogs and beyond." This is a post-FRBR set of principles, and the objectives of the catalog are given in points with headings find, identify, select, obtain and navigate. Of course, the first four are the FRBR user tasks. The fifth one, navigate, as I recall was suggested by Elaine Svenonius and obviously was looked on favorably even though it hasn't been added to the FRBR document, as far as I know.

The statement of functions of the catalog in this 2009 draft is rather long, but the "find" function gives an idea of how the goals of the catalog have changed:

ICP 2009

It's worth pointing out a couple of key changes. The first is the statement "as the result of a search..." The 1961 principles were designed for an alphabetically arranged catalog; this set of principles recognizes that there are searches and search results in online catalogs, and it never mentions alphabetical arrangement. The second is that there is specific reference to relationships, and that these are expected to be searchable along with attributes of the resource. The third is that there is something called "secondary limiting of a search result." This latter appears to reflect the use of facets in search interfaces.

The differences between the 2015 draft of the ICP and this 2009 version are relatively minor. The big jump in thinking takes place between the 1961 version and the 2009 version. My comments (pdf) to the committee are as much about the 2009 version as the 2015 one. I make three points:

1. The catalog is a technology, and cataloging is therefore in a close relation to that technology
Although the ICP talks about "find," etc., it doesn't relate those actions to the form of the "authorized access points." There is no recognition that searching today is primarily on keyword, not on left-anchored strings.

2. Some catalog functions are provided by the catalog but not by cataloging
The 2015 ICP includes among its principles that of accessibility of the catalog for all users. Accessibility, however, is primarily a function of the catalog technology, not the content of the catalog data. It also recommends (to my great pleasure) that the catalog data be made available for open access. This is another principle that is not content-based. Equally important is the idea, which is expressed in the 2015 principles under "navigate" as: "... beyond the catalogue, to other catalogues and in non-library contexts." This is clearly a function of the catalog, with the support of the catalog data, but what data serves this function is not mentioned.

3. Authority control must be extended to all elements that have recognized value for retrieval
This mainly refers to the inclusion of the elements that serve as limiting facets on retrieved sets. None of the elements listed here are included in the ICP's instructions on "authorized access points," yet these are, indeed, access points. Uncontrolled forms of dates, places, content, carrier, etc., are simply not usable as limits. Yet nowhere in the document is the form of these access points addressed.

There is undoubtedly much more that could be said about the principles, but this is what seemed to me to be appropriate to the request for comment on this draft.

Monday, May 11, 2015

Catalogers and Coders

Mandy Brown has a blog post highlighting The Real World of Technology by Ursula Franklin. As Brown states it, Franklin describes

holistic technologies and prescriptive technologies. In the former, a practitioner has control over an entire process, and frequently employs several skills along the way...By contrast, a prescriptive technology breaks a process down into steps, each of which can be undertaken by a different person, often with different expertise.

It's the artisan vs. Henry Ford's dis-empowered worker. As we know, there has been some recognition, especially in the Japanese factory models, that dis-empowered workers produce poorer quality goods with less efficiency. Brown has a certain riff on this, but what came immediately to my mind was the library catalog.

The library catalog is not a classic case of the assembly line, but it has the element of different workers being tasked with different aspects of an outcome, but no one responsible for the whole. We have (illogically, I say) separated the creation of the catalog data from the creation of the catalog.

In the era of card catalogs (and the book catalogs that preceded them), catalogers created the catalog. What they produced was what people used, directly. Catalogers decided the headings that would be the entry points to the catalog, and thus determined how access would take place. Catalogers wrote the actual display that the catalog user would see. Whether or not people would find things in the catalog was directly in the hands of the catalogs, and they could decide what would bring related entries within card-flipping distance of each other, and whether cross-references were needed.

The technology of the card catalog was the card. The technologist was the cataloger.

This is no longer the case. The technology of the catalog is now a selection of computer systems. Not only are catalogers not designing these systems, in most cases no one in libraries is doing so. This has created a strange and uncomfortable situation in the library profession. Cataloging is still based on rules created by a small number of professional bodies, mostly IFLA and some national libraries. IFLA is asking for comments on its latest edition of the International Cataloging Principles but those principles are not directly related to catalog technology. Some Western libraries are making use of or moving toward the rules created by the Joint Steering Committee for Resource Description and Access (RDA), which boasts of being "technology neutral." These two new-ish standards have nothing to say about the catalog itself, as if cataloging existed in some technological limbo.

Meanwhile, work goes on in bibliographic data arena with the development of the BIBFRAMEs, variations on a new data carrier for cataloging data. This latter work has nothing to say about how resources should be cataloged, and also has nothing to say about what services catalogs should perform, nor how they should make the data useful. It's philosophy is "whatever in, whatever out."

Meanwhile #2, library vendors create the systems that will use the machine-readable data that is created following cataloging rules that very carefully avoid any mention of functionality or technology. Are catalog users to be expected to perform left-anchored searches on headings? Keyword searches on whole records? Will the system provide facets that can be secondary limits on search results? What will be displayed to the user? What navigation will be possible? Who decides?

The code4lib community talks about getting catalogers and coders together, and wonders if catalogers should be taught to code. The problem, however, is not between coders and catalogers but is systemic in our profession. We have kept cataloging and computer technology separate, as if they aren't both absolutely necessary. One is the chassis, the other the engine, but nothing useful can come off the assembly line unless both are present in the design and the creation of the product.

It seems silly to have to say this, but you simply cannot design data and the systems that will use the data each in their own separate silo. This situation is so patently absurd that I am embarrassed to have to bring it up. Getting catalogers and coders together is not going to make a difference as long as they are trying to fit one group's round peg into the others' square hole. (Don't think about that metaphor too much.) We have to have a unified design, that's all there is to it.

What are the odds? *sigh*

Wednesday, April 29, 2015

The 50's were a long decade

Born in 1949, I grew up in the 50's. Those were the days of Gracie Allen ("Say goodnight, Gracie." "Goodnight, Gracie."), Lucille Ball, and Alice of the Honeymooners, for whom "To the moon, Alice!" did not mean that she could ever be astronaut. These were the models for the 1950's woman.

I was always bright and precocious. Before starting kindergarten I taught myself to read the Dick and Jane books that were being read to me. My parents didn't believe that I could read so they bought a book I had never seen and I read it to them. From then on my mother's mantra was, "Karen, no one is ever going to love you if you don't play dumb." Marilyn Monroe in Some Like it Hot. Not Myrna Loy in The Thin Man.

I wore glasses (from the age of 5) in a time when the chant "Men never make passes at girls who were glasses" was often heard.

Being smart and being female is still difficult in our culture. Esther Dyson, who for long has been one of the cultivators of deep thinking around technology, was introduced as "the most powerful woman in American business", to which she replied that she considered herself at least one of the most powerful people in her field. She's right. But being saddled with the "woman" category it means that she can be considered apart, not a threat to the status of any men who might otherwise be lessened by her success in "their" world.

I was fortunate to have a few high school teachers who appreciated intellect in a girl. (I was unfortunate to also have the local high school lech who paid girls A's to sit in the front of the class in mini-skirts but without panties.) It wasn't really until I hit college that the discrimination against smart women became intense. I can only imagine that it is because college professors see themselves as grooming the next generation of college professors, while high school teachers instead had the task of helping us learn what they had to teach, then leave. In my first semester at college I had one of those introductory courses that was held in an auditorium -- probably a history class of some sort. After class one day I walked with the professor toward his office and chatted with him about some idea that had come to me from his lecture. He was friendly and encouraging. At the next class meeting he began by saying: "After class last time, a young man presented me with a very interesting idea." He had not mistaken me at the time for a boy. This was a small private college and there was a dress code. I had long, flowing hair, wore makeup, and was wearing a dress. Instead, his memory turned me into a boy because it would have been impossible for him to have received a new and interesting idea from a girl. You can imagine how likely it would have been for him to become the mentor to a bright woman looking to pursue an academic career.

You may also be able to imagine how this statement made me disappear, not only in his eyes and the eyes of my classmates, who would never know that I was that "young man," but also in my own eyes. Psychologists call it "loss of significance" -- that your very being is denied; you are erased, post facto. I don't wonder that so many women suffer depression, because there is nothing more disorienting or more discouraging than having your own experience denied, pulled out from under you, and to be made invisible.

The stories, of course, abound; I couldn't begin to tell them all. This one, though, must be told: There was the boss who had hired me as the only woman holding a management position in the organization. He chuckled in surprise and disdain when I asked to be included in the meetings that he held with the otherwise-all-male management staff, which he had not thought to invite me to. He was even more surprised when I spoke up at the meetings. One day he called me into his office and praised me by saying "We're lucky to have found you. If you were a man we'd have to be paying you twice as much."

With great pain I realize that I experienced all of this from a position of great privilege, as a white, middle-class, educated American. I cannot imagine the prejudice of race or caste that others must live with, nor how that affects their sense of themselves as whole human beings.

I'm glad I went into librarianship, with all of its warts. I have spent my career surrounded by smart women. I got to create technology with women. I hope to do more of that. My main message here today, though, is this: if you can help a young woman understand her own worth, to appreciate her abilities, and to see being smart as a positive, please do, in whatever way you can. Whether it's encouragement, scholarships, or raising a daughter who never hears "no one will ever love you if don't play dumb." Let's make sure that the fifties are behind us.

Tuesday, April 21, 2015

Come in, no questions asked

by Eusebia Parrotto, Trento Public Library*

He is of an indeterminate age, somewhere between 40 and 55. He's wearing two heavy coats, one over the other, even though it's 75 degrees out today (shirt-sleeve weather) and a large backpack. He's been a regular in the library for a couple of months, from first thing in the morning until closing in the evening. He moves from the periodicals area along the hall to the garden on fair weather days. Sundays, when the library is closed, he is not far away, in the nearby park or on the pedestrian street just outside.

I run into him at the coffee vending machine. He asks me, somewhat hesitantly, if I have any change. I can see that he's missing most of his front teeth. I've got a euro in my hand, and I offer it to him. He takes it slowly, looks at it carefully, and is transformed. His face lights up with a huge smile, and like an excited child, but with a mere whisper of a voice, he says: "Wow!! A euro! Thanks!" I smile back at him, and I can see that he's trying to say something else but he can't, it tires him. I can smell the alcohol on his breath and I assume that's the reason for his lapse. He motions to me to wait while he tries to bring forth the sounds, the words. I do wait, watching. He lifts a hand to the center of his neck as if to push out the words, and he says, with great effort and slowly: "I don't speak well, I had an operation. Look." There is a long scar on his throat that goes from one ear to the other. I recognize what it is. He says again, "Wait, look" and pulls up his left sleeve to show me another scar along the inside of his forearm that splits in two just before his wrist. "I know what that is," I say.

Cancer of the throat. An incision is made from under the chin to arrive at the diseased tissue. They then reconstruct the excised portion using healthy tissue taken from the arm. That way the damaged area will recover, to the extent it can, its original functions.

With great effort and determination he tells me, giving me the signal to wait when he has to pause, that he was operated on nearly a year go, after three years in which he thought he had a stubborn toothache. When he couldn't take it any more he was taken to the emergency room and was admitted to hospital immediately. I tell him that he's speaking very clearly, and that he has to exercise his speech often to improve his ability to articulate words; it's a question of muscle tone and practice. I ask him if he is able to eat. I know that for many months, even years, after the operation you can only get down liquids and liquified foods. He replies "soups, mainly!" It will get better, I tell him.

His eyes shine with a bright light, he smiles at me, signals to me to wait. Swallows. Concentrates and continues his story, about a woman doctor friend, who he only discovered was a doctor after he got sick. He tells me some details about the operation; the radiation therapy. This is the second time that he has cheated death, he says. The first was when he fell and hit his head and was in a coma for fifteen days. "So now this, and it's the second time that I have been brought back from the brink." He says this with a smile, even a bit cocky, with punch. And then tears come to his eyes. He continues to smile, impishly, toothlessly. "I'm going to make it, you'll see. Right now I'm putting together the forms to get on disability, maybe that will help." "Let's hope it works out," I say as we part. And he replies: "No, not hope. You've got to believe."

The derelicts of the library. A few months back it was in all of the local papers. One student wrote a letter to the newspaper complaining that the presence of the homeless and the vagabonds profaned the grand temple of culture that is the library. Suddenly everyone had something to say on the matter; even those who had never even been to the library were upset about the derelicts there. They said it made them feel unsafe. Others told how it made them feel uncomfortable to come into the library and see them occupying the chairs all day long. Even when half of the chairs were free they were taking up the places of those who needed to study. Because you can't obviously mix with them.

I don't know how often the person I chatted with today had the occasion to speak to others about his illness. It's a terrible disease, painful, and it leaves one mutilated for life. Recovery from the operation is slow, over months, years. It's an infliction that leaves you with a deep fear even when you think you are cured. That man had such a desire to tell the story of his victory over the disease, his desire to live, his faith that never left him even in the darkest moments. I know this from the great light that radiated from his visage, and from his confident smile.

I don't know of any other place but standing at the vending machine of a library where such an encounter is possible between two worlds, two such distant worlds. I don't know where else there can be a simple conversation between two persons who, by rule or by necessity, occupy these social extremities; between one who lives on the margins of society and another who lives the good life; who enjoys the comforts of a home, a job, clean clothes and access to medical care. Not in other public places, which are open only to a defined segment of the population: consumers, clients, visitors to public offices. These are places where you are defined momentarily based on your social activities. Not in the street, or in the square, because there are the streets and squares that are frequented by them, and the others, well-maintained, that are for us. And if one of them ventures into our space he is surely not come to tell us his story, nor are we there to listen to it.

He is called a derelict, but this to me is the beauty of the public library. It is a living, breathing, cultural space that is at its best when it gathers in all of those beings who are kept outside the walls of civil society, in spite of the complexity and contradictions that implies.

The library is a place with stories; there are the stories running through the thousands of books in the library as well as the stories of the people who visit it. In the same way that we approach a new text with openness and trust, we can also be open and trusting as listeners. Doing so, we'll learn that the stories of others are not so different from our own; that the things that we care about in our lives, the important things, are the same for everyone. That they are us, perhaps a bit more free, a bit more suffering, with clothes somewhat older than our own.

Then I read this. It tells the story of the owner of a fast food restaurant who, having noticed that after closing someone was digging through the trash cans looking for something to eat. So she put a sign on store window, inviting the person to stop in one day and have a fresh meal, for free. The sign ends with: "No questions asked."

So this is what I want written on the front door of all libraries: "Come in, whoever you are. No questions asked."

*Translated and posted with permission. Original.

[Note: David Lankes tweeted (or re-tweeted, I don't remember) a link to Eusebia's blog, and I was immediately taken by it. She writes beautifully of the emotion of the public library. I will translate other posts as I can. And I would be happy to learn of other writers of this genre that we can encourage and publicize. - kc]

Friday, January 16, 2015

Real World Objects

I was asked a question about the meaning and import of the RDF concept of "Real World Object" (RWO) and didn't give a very good answer off the cuff. I'll try to make up for that here.

The concept of RWO comes out of the artificial intelligence (AI) community. Imagine that you are developing robots and other machines that must operate within the same world that you and I occupy. You have to find a way to "explain," in a machine-operational way, everything in our world: stairs and ramps, chairs and tables, the effect of gravity on a cup when you miss placing it on the table, the stars, love and loyalty (concepts are also objects in this view). The AI folks have actually set a goal to create such descriptions, which they call ontologies, for everything in the world; for every RWO.

You might consider this a conceit, or a folly, but that's the task they have set for themselves.

The original Scientific American article that described the semantic web used as its example intelligent 'bots that would manage your daily calendar and make appointments for you. This was far short of the AI "ontology of everything" but the result that matters to us now is that there have been AI principles baked into the development of RDF, including the concept of the RWO.

RWO isn't as mysterious as it may seem, and I can provide a simple example from our world. The MARC record for a book has the book as its RWO, and most of its data elements "speak" about the book. At the same time, we can say things about the MARC record, such as who originally created it, and who edited it last, and when. The book and the record are different things, different RWO's in an RDF view. That's not controversial, I would assume.

Our difficulties arise because in the past we didn't have a machine-actionable way to distinguish between those two "things": the book and the record. Each MARC record got an identifier, which identified the record. We've never had identifiers for the thing the record describes (although the ISBN sometimes works this way). It has always been safe to assume that the record was about the book, and what identified the book was the information in the record. So we obviously have a real world object, but we didn't give it its own identifier - because humans could read the text of the record and understand what it meant (most of the time or some of the time).

I'm not fully convinced that everything can be reduced to RWO/not-RWO, and so I'm not buying that is the only way to talk about our world and our data. It should be relatively easy, though, without getting into grand philosophical debates, to determine the difference between our metadata and the thing it describes. That "thing it describes" can be fuzzy in terms of the real world, such as when the spirit of Edgar Cayce speaks through a medium and writes a book. I don't want to have to discuss whether the spirit of Edgar Cayce is real or not. We can just say that "whoever authors the book is as real as it gets." So if we forget RWO in the RDF sense and just look sensibly at our data, I'm sure we can come to a practical agreement that allows both the metadata and the real world object to exist.

That doesn't resolve the problem of identifiers, however, and for machine-processing purposes we do need separate identifiers for our descriptions and what we are describing.* That's the problem we need to solve, and while we may go back and forth a bit on the best solution, the problem is tractable without resorting to philosophical confabulations.

* I think that the multi-level bibliographic descriptions like FRBR and BIBFRAME make this a bit more complex, but I haven't finished thinking about that, so will return if I have a clearer idea.

Saturday, January 10, 2015

This is what sexism looks like #2

Libraries, it seems, are in crisis, and many people are searching for answers. Someone I know posted a blog post pointing to community systems like Stack Overflow and Reddit as examples of how libraries could create "community." He especially pointed out the value of "gamification" - the ranking of responses by the community - as something libraries should consider. His approach was that it is "human nature" to want to gain points. "We are made this way: give us a contest and we all want to win." (The rest of the post and the comments went beyond this to the questions of what libraries should be today, etc.)

There were many (about 4 dozen, almost all men) comments on his blog (which I am not linking to, because I don't want this to be a "call out"). He emailed me asking for my opinion.

I responded only to his point about gamification, which was all I had time for, saying that in that area his post ignored an important gender issue. The competitive aspect was part of what makes those sites unfriendly to women.

I told him that there have been many studies of how children play, and they reveal some distinct differences between genders. Boys begin play by determining a set of rules that they will follow, and during play they may stop to enforce or discuss the rules. Girls begin to play with an unstructured understanding of the game, and, if problems arise during play, they work on a consensus. Boys games usually have points and winners. Girls' games are often without winners and are "taking turns" games. Turning libraries into a "winning" game could result in something like Reddit, where few women go, or if they do they are reluctant to participate.

And I said: "As a woman, I avoid the win/lose situations because, based on general social status (and definitely in the online society) I am already designated a loser. My position is pre-determined by my sex, so the game is not appealing."

I didn't post this to the site, just emailed it to the owner. It's good that I did not. The response from the blog owner was:

This is very interesting. But I need to see some proof.

Some proof. This is truly amazing. Search on Google Scholar for "games children gender differences" and you are overwhelmed with studies.

But it's even more amazing because none of the men who posted their ideas to the site were asked for proof. Their ideas are taken at face value. Of course, they didn't bring up issues of gender, class, or race in their responses, as if these are outside of the discussion of what libraries should be. And to bring them up is an "inconvenience" in the conversation, because the others do not want to hear it.

He also pointed me to a site that is "friendly to women." To that I replied that women decide what is "friendly to women."

I was invited to comment on the blog post, but it is now clear that my comments will not be welcome. In fact, I'd probably only get inundated with posts like "prove it." This does seem to be the response whenever a woman or minority points out an inconvenient truth.

Welcome to my world.