Monday, August 10, 2015

Google becomes Alphabet

I thought it was a joke, especially when the article said that they have two investment companies, Ventures and Capital. But it's all true, so I have this to say:

G is for Google, H is for cHutzpah. In addition to our investment companies Ventures and Capital, we are instituting a think tank, Brain, and a company focused on carbon-based life-based forms, Body. Servicing these will be three key enterprises: Food, Water, and Air. Support will be provided by Planet, a subsidiary of Universe. Of course, we'll also need to provide Light. Let there be. Singularity. G is for God. 

Friday, July 17, 2015

Flexibility in bibliographic models

A motley crew of folks had a chat via Google Hangout earlier this week to talk about FRBR and Fedora. I know exactly squat about Fedora, but I've just spent 18 months studying FRBR and other bibliographic models, so I joined the discussion. We came to a kind of nodding agreement, that I will try to express here, but one that requires us to do some hard work if we are to make it something we can work with.

The primary conclusion was that the models of FRBR and BIBFRAME, with their separation of bibliographic information into distinct entities, are too inflexible for general use. There are simply too many situations in which either the nature of the materials or the available metadata simply does not fit into the entity boundaries defined in those models. This is not news -- since the publication of FRBR in 1998 there are have numerous articles pointing out the need for modifications of FRBR for different materials (music, archival materials, serials, and others). The report of the audio-visual community to BIBFRAME said the same. Similar criticisms have been aimed at recent generations of cataloging rules, whose goal is to provide uniformity in bibliographic description across all media types. The differences in treatment that are needed by the various communities are not mutually compatible, which means that a single model is not going to work over the vast landscape that is "cultural heritage materials."

At the same time, folks in this week's informal discussion were able to readily cite use cases in which they would want to identify a group of metadata statements that would define a particular aspect of the data, such as a work or an item. The trick, therefore, is to find a sweet spot between the need for useful semantics and the need for flexibility within the heterogeneous cultural heritage collections that could benefit from sharing and linking their data amongst them.

One immediate thought is: let's define a core! (OK, it's been done, but maybe that's a different core.) The problem with this idea is that there are NO descriptive elements that will be useful for all materials. Title? (seems obvious) -- but there are many materials in museums and archives that have no title, from untitled art works, to museum pieces ("Greek vase",) to materials in archives ("Letter from John to Mary"). Although these are often given names of a sort, none have titles that function to identify them in any meaningful way. Creators? From anonymous writings to those Greek vases, not to mention the dinosaur bones and geodes in a science museum, many things don't have identifiable creators. Subjects? Well, if you mean this to be "topic" then again, not everything has a topic; think "abstract art" and again those geodes. Most things have a genre or a type but standardizing on those alone would hardly reap great benefits in data sharing.

The upshot, at least the conclusion that I reach, is that there are no universals. At best there is some overlap between (A & B) and then between (B & C), etc. What the informal group that met this week concluded is that there is some value in standardizing among like data types, simply to make the job of developers easier. The main requirement overall, though, is to have a standard way to share ones metadata choices, not unlike an XML schema, but for the RDF world. Something that others can refer to or, even better, use directly in processing data you provide.

Note that none of the above means throwing over FRBR, BIBFRAME, or RDA entirely. Each has defined some data elements that will be useful, and it is always better to re-use than to re-invent. But the attempts to use these vocabularies to fix a single view of bibliographic data is simply not going to work in a world as varied as the one we live in. We limit ourselves greatly if we reject data that does not conform to a single definition rather than making use of connections between close but not identical data communities.

There's no solution being offered at this time, but identifying the target is a good first step.

Thursday, May 28, 2015

International Cataloguing Principles, 2015

IFLA is revising the International Cataloguing Principles and asked for input. Although I doubt that it will have an effect, I did write up my comments and send them in. Here's my view of the principles, including their history.

The original ICP dates from 1961 and read like a very condensed set of cataloging rules. [Note: As T Berger points out, this document was entitled "Paris Principles", not ICP.] It was limited to choice and form of entries (personal and corporate authors, titles). It also stated clearly that it applied to alphabetically sequenced catalogs:
The principles here stated apply only to the choice and form of headings and entry words -- i.e. to the principal elements determining the order of entries -- in catalogues of printed books in which entries under authors' names and, where these are inappropriate or insufficient, under the titles of works are combined in one alphabetical sequence.
The basic statement of principles was not particularly different from those stated by Charles Ammi Cutter in 1875.

Cutter

ICP 1961


 Note that the ICP does not include subject access, which was included in Cutter's objectives for the catalog. Somewhere between 1875 and 1961, cataloging became descriptive cataloging only. Cutter's rules did include a fair amount detail about subject cataloging (in 13 pages, as compared to 23 pages on authors).

The next version of the principles was issued in 2009. This version is intended to be "applicable to online catalogs and beyond." This is a post-FRBR set of principles, and the objectives of the catalog are given in points with headings find, identify, select, obtain and navigate. Of course, the first four are the FRBR user tasks. The fifth one, navigate, as I recall was suggested by Elaine Svenonius and obviously was looked on favorably even though it hasn't been added to the FRBR document, as far as I know.

The statement of functions of the catalog in this 2009 draft is rather long, but the "find" function gives an idea of how the goals of the catalog have changed:

ICP 2009

It's worth pointing out a couple of key changes. The first is the statement "as the result of a search..." The 1961 principles were designed for an alphabetically arranged catalog; this set of principles recognizes that there are searches and search results in online catalogs, and it never mentions alphabetical arrangement. The second is that there is specific reference to relationships, and that these are expected to be searchable along with attributes of the resource. The third is that there is something called "secondary limiting of a search result." This latter appears to reflect the use of facets in search interfaces.

The differences between the 2015 draft of the ICP and this 2009 version are relatively minor. The big jump in thinking takes place between the 1961 version and the 2009 version. My comments (pdf) to the committee are as much about the 2009 version as the 2015 one. I make three points:
    1.  The catalog is a technology, and cataloging is therefore in a close relation to that technology
    Although the ICP talks about "find," etc., it doesn't relate those actions to the form of the "authorized access points." There is no recognition that searching today is primarily on keyword, not on left-anchored strings.

    2. Some catalog functions are provided by the catalog but not by cataloging
    The 2015 ICP includes among its principles that of accessibility of the catalog for all users. Accessibility, however, is primarily a function of the catalog technology, not the content of the catalog data. It also recommends (to my great pleasure) that the catalog data be made available for open access. This is another principle that is not content-based. Equally important is the idea, which is expressed in the 2015 principles under "navigate" as: "... beyond the catalogue, to other catalogues and in non-library contexts." This is clearly a function of the catalog, with the support of the catalog data, but what data serves this function is not mentioned.

    3. Authority control must be extended to all elements that have recognized value for retrieval
    This mainly refers to the inclusion of the elements that serve as limiting facets on retrieved sets. None of the elements listed here are included in the ICP's instructions on "authorized access points," yet these are, indeed, access points. Uncontrolled forms of dates, places, content, carrier, etc., are simply not usable as limits. Yet nowhere in the document is the form of these access points addressed.

    There is undoubtedly much more that could be said about the principles, but this is what seemed to me to be appropriate to the request for comment on this draft.

      Monday, May 11, 2015

      Catalogers and Coders

      Mandy Brown has a blog post highlighting The Real World of Technology by Ursula Franklin. As Brown states it, Franklin describes
      holistic technologies and prescriptive technologies. In the former, a practitioner has control over an entire process, and frequently employs several skills along the way...By contrast, a prescriptive technology breaks a process down into steps, each of which can be undertaken by a different person, often with different expertise.
      It's the artisan vs. Henry Ford's dis-empowered worker. As we know, there has been some recognition, especially in the Japanese factory models, that dis-empowered workers produce poorer quality goods with less efficiency. Brown has a certain riff on this, but what came immediately to my mind was the library catalog.

      The library catalog is not a classic case of the assembly line, but it has the element of different workers being tasked with different aspects of an outcome, but no one responsible for the whole. We have (illogically, I say) separated the creation of the catalog data from the creation of the catalog.

      In the era of card catalogs (and the book catalogs that preceded them), catalogers created the catalog. What they produced was what people used, directly. Catalogers decided the headings that would be the entry points to the catalog, and thus determined how access would take place. Catalogers wrote the actual display that the catalog user would see. Whether or not people would find things in the catalog was directly in the hands of the catalogs, and they could decide what would bring related entries within card-flipping distance of each other, and whether cross-references were needed.

      The technology of the card catalog was the card. The technologist was the cataloger.

      This is no longer the case. The technology of the catalog is now a selection of computer systems. Not only are catalogers not designing these systems, in most cases no one in libraries is doing so. This has created a strange and uncomfortable situation in the library profession. Cataloging is still based on rules created by a small number of professional bodies, mostly IFLA and some national libraries. IFLA is asking for comments on its latest edition of the International Cataloging Principles but those principles are not directly related to catalog technology. Some Western libraries are making use of or moving toward the rules created by the Joint Steering Committee for Resource Description and Access (RDA), which boasts of being "technology neutral." These two new-ish standards have nothing to say about the catalog itself, as if cataloging existed in some technological limbo.

      Meanwhile, work goes on in bibliographic data arena with the development of the BIBFRAMEs, variations on a new data carrier for cataloging data. This latter work has nothing to say about how resources should be cataloged, and also has nothing to say about what services catalogs should perform, nor how they should make the data useful. It's philosophy is "whatever in, whatever out."

      Meanwhile #2, library vendors create the systems that will use the machine-readable data that is created following cataloging rules that very carefully avoid any mention of functionality or technology. Are catalog users to be expected to perform left-anchored searches on headings? Keyword searches on whole records? Will the system provide facets that can be secondary limits on search results? What will be displayed to the user? What navigation will be possible? Who decides?

      The code4lib community talks about getting catalogers and coders together, and wonders if catalogers should be taught to code. The problem, however, is not between coders and catalogers but is systemic in our profession. We have kept cataloging and computer technology separate, as if they aren't both absolutely necessary. One is the chassis, the other the engine, but nothing useful can come off the assembly line unless both are present in the design and the creation of the product.

      It seems silly to have to say this, but you simply cannot design data and the systems that will use the data each in their own separate silo. This situation is so patently absurd that I am embarrassed to have to bring it up. Getting catalogers and coders together is not going to make a difference as long as they are trying to fit one group's round peg into the others' square hole. (Don't think about that metaphor too much.) We have to have a unified design, that's all there is to it.

      What are the odds? *sigh*

      Wednesday, April 29, 2015

      The 50's were a long decade

      Born in 1949, I grew up in the 50's. Those were the days of Gracie Allen ("Say goodnight, Gracie." "Goodnight, Gracie."), Lucille Ball, and Alice of the Honeymooners, for whom "To the moon, Alice!" did not mean that she could ever be astronaut. These were the models for the 1950's woman.

      I was always bright and precocious. Before starting kindergarten I taught myself to read the Dick and Jane books that were being read to me. My parents didn't believe that I could read so they bought a book I had never seen and I read it to them. From then on my mother's mantra was, "Karen, no one is ever going to love you if you don't play dumb." Marilyn Monroe in Some Like it Hot. Not Myrna Loy in The Thin Man.

      I wore glasses (from the age of 5) in a time when the chant "Men never make passes at girls who were glasses" was often heard.

      Being smart and being female is still difficult in our culture. Esther Dyson, who for long has been one of the cultivators of deep thinking around technology, was introduced as "the most powerful woman in American business", to which she replied that she considered herself at least one of the most powerful people in her field. She's right. But being saddled with the "woman" category it means that she can be considered apart, not a threat to the status of any men who might otherwise be lessened by her success in "their" world.

      I was fortunate to have a few high school teachers who appreciated intellect in a girl. (I was unfortunate to also have the local high school lech who paid girls A's to sit in the front of the class in mini-skirts but without panties.) It wasn't really until I hit college that the discrimination against smart women became intense. I can only imagine that it is because college professors see themselves as grooming the next generation of college professors, while high school teachers instead had the task of helping us learn what they had to teach, then leave. In my first semester at college I had one of those introductory courses that was held in an auditorium -- probably a history class of some sort. After class one day I walked with the professor toward his office and chatted with him about some idea that had come to me from his lecture. He was friendly and encouraging. At the next class meeting he began by saying: "After class last time, a young man presented me with a very interesting idea." He had not mistaken me at the time for a boy. This was a small private college and there was a dress code. I had long, flowing hair, wore makeup, and was wearing a dress. Instead, his memory turned me into a boy because it would have been impossible for him to have received a new and interesting idea from a girl. You can imagine how likely it would have been for him to become the mentor to a bright woman looking to pursue an academic career.

      You may also be able to imagine how this statement made me disappear, not only in his eyes and the eyes of my classmates, who would never know that I was that "young man," but also in my own eyes. Psychologists call it "loss of significance" -- that your very being is denied; you are erased, post facto. I don't wonder that so many women suffer depression, because there is nothing more disorienting or more discouraging than having your own experience denied, pulled out from under you, and to be made invisible.

      The stories, of course, abound; I couldn't begin to tell them all. This one, though, must be told: There was the boss who had hired me as the only woman holding a management position in the organization. He chuckled in surprise and disdain when I asked to be included in the meetings that he held with the otherwise-all-male management staff, which he had not thought to invite me to. He was even more surprised when I spoke up at the meetings. One day he called me into his office and praised me by saying "We're lucky to have found you. If you were a man we'd have to be paying you twice as much."

      With great pain I realize that I experienced all of this from a position of great privilege, as a white, middle-class, educated American. I cannot imagine the prejudice of race or caste that others must live with, nor how that affects their sense of themselves as whole human beings.

      I'm glad I went into librarianship, with all of its warts. I have spent my career surrounded by smart women. I got to create technology with women. I hope to do more of that. My main message here today, though, is this: if you can help a young woman understand her own worth, to appreciate her abilities, and to see being smart as a positive, please do, in whatever way you can. Whether it's encouragement, scholarships, or raising a daughter who never hears "no one will ever love you if don't play dumb." Let's make sure that the fifties are behind us.

      Tuesday, April 21, 2015

      Come in, no questions asked

      by Eusebia Parrotto, Trento Public Library*

      He is of an indeterminate age, somewhere between 40 and 55. He's wearing two heavy coats, one over the other, even though it's 75 degrees out today (shirt-sleeve weather) and a large backpack. He's been a regular in the library for a couple of months, from first thing in the morning until closing in the evening. He moves from the periodicals area along the hall to the garden on fair weather days. Sundays, when the library is closed, he is not far away, in the nearby park or on the pedestrian street just outside.

      I run into him at the coffee vending machine. He asks me, somewhat hesitantly, if I have any change. I can see that he's missing most of his front teeth. I've got a euro in my hand, and I offer it to him. He takes it slowly, looks at it carefully, and is transformed. His face lights up with a huge smile, and like an excited child, but with a mere whisper of a voice, he says: "Wow!! A euro! Thanks!" I smile back at him, and I can see that he's trying to say something else but he can't, it tires him. I can smell the alcohol on his breath and I assume that's the reason for his lapse. He motions to me to wait while he tries to bring forth the sounds, the words. I do wait, watching. He lifts a hand to the center of his neck as if to push out the words, and he says, with great effort and slowly: "I don't speak well, I had an operation. Look." There is a long scar on his throat that goes from one ear to the other. I recognize what it is. He says again, "Wait, look" and pulls up his left sleeve to show me another scar along the inside of his forearm that splits in two just before his wrist. "I know what that is," I say.

      Cancer of the throat. An incision is made from under the chin to arrive at the diseased tissue. They then reconstruct the excised portion using healthy tissue taken from the arm. That way the damaged area will recover, to the extent it can, its original functions.

      With great effort and determination he tells me, giving me the signal to wait when he has to pause, that he was operated on nearly a year go, after three years in which he thought he had a stubborn toothache. When he couldn't take it any more he was taken to the emergency room and was admitted to hospital immediately. I tell him that he's speaking very clearly, and that he has to exercise his speech often to improve his ability to articulate words; it's a question of muscle tone and practice. I ask him if he is able to eat. I know that for many months, even years, after the operation you can only get down liquids and liquified foods. He replies "soups, mainly!" It will get better, I tell him.

      His eyes shine with a bright light, he smiles at me, signals to me to wait. Swallows. Concentrates and continues his story, about a woman doctor friend, who he only discovered was a doctor after he got sick. He tells me some details about the operation; the radiation therapy. This is the second time that he has cheated death, he says. The first was when he fell and hit his head and was in a coma for fifteen days. "So now this, and it's the second time that I have been brought back from the brink." He says this with a smile, even a bit cocky, with punch. And then tears come to his eyes. He continues to smile, impishly, toothlessly. "I'm going to make it, you'll see. Right now I'm putting together the forms to get on disability, maybe that will help." "Let's hope it works out," I say as we part. And he replies: "No, not hope. You've got to believe."

      The derelicts of the library. A few months back it was in all of the local papers. One student wrote a letter to the newspaper complaining that the presence of the homeless and the vagabonds profaned the grand temple of culture that is the library. Suddenly everyone had something to say on the matter; even those who had never even been to the library were upset about the derelicts there. They said it made them feel unsafe. Others told how it made them feel uncomfortable to come into the library and see them occupying the chairs all day long. Even when half of the chairs were free they were taking up the places of those who needed to study. Because you can't obviously mix with them.

      I don't know how often the person I chatted with today had the occasion to speak to others about his illness. It's a terrible disease, painful, and it leaves one mutilated for life. Recovery from the operation is slow, over months, years. It's an infliction that leaves you with a deep fear even when you think you are cured. That man had such a desire to tell the story of his victory over the disease, his desire to live, his faith that never left him even in the darkest moments. I know this from the great light that radiated from his visage, and from his confident smile.

      I don't know of any other place but standing at the vending machine of a library where such an encounter is possible between two worlds, two such distant worlds. I don't know where else there can be a simple conversation between two persons who, by rule or by necessity, occupy these social extremities; between one who lives on the margins of society and another who lives the good life; who enjoys the comforts of a home, a job, clean clothes and access to medical care. Not in other public places, which are open only to a defined segment of the population: consumers, clients, visitors to public offices. These are places where you are defined momentarily based on your social activities. Not in the street, or in the square, because there are the streets and squares that are frequented by them, and the others, well-maintained, that are for us. And if one of them ventures into our space he is surely not come to tell us his story, nor are we there to listen to it.

      He is called a derelict, but this to me is the beauty of the public library. It is a living, breathing, cultural space that is at its best when it gathers in all of those beings who are kept outside the walls of civil society, in spite of the complexity and contradictions that implies.

      The library is a place with stories; there are the stories running through the thousands of books in the library as well as the stories of the people who visit it. In the same way that we approach a new text with openness and trust, we can also be open and trusting as listeners. Doing so, we'll learn that the stories of others are not so different from our own; that the things that we care about in our lives, the important things, are the same for everyone. That they are us, perhaps a bit more free, a bit more suffering, with clothes somewhat older than our own.

      Then I read this. It tells the story of the owner of a fast food restaurant who, having noticed that after closing someone was digging through the trash cans looking for something to eat. So she put a sign on store window, inviting the person to stop in one day and have a fresh meal, for free. The sign ends with: "No questions asked."

      So this is what I want written on the front door of all libraries: "Come in, whoever you are. No questions asked."


      *Translated and posted with permission. Original.

      [Note: David Lankes tweeted (or re-tweeted, I don't remember) a link to Eusebia's blog, and I was immediately taken by it. She writes beautifully of the emotion of the public library. I will translate other posts as I can. And I would be happy to learn of other writers of this genre that we can encourage and publicize. - kc]

      Friday, January 16, 2015

      Real World Objects

      I was asked a question about the meaning and import of the RDF concept of "Real World Object" (RWO) and didn't give a very good answer off the cuff. I'll try to make up for that here.

      The concept of RWO comes out of the artificial intelligence (AI) community. Imagine that you are developing robots and other machines that must operate within the same world that you and I occupy. You have to find a way to "explain," in a machine-operational way, everything in our world: stairs and ramps, chairs and tables, the effect of gravity on a cup when you miss placing it on the table, the stars, love and loyalty (concepts are also objects in this view). The AI folks have actually set a goal to create such descriptions, which they call ontologies, for everything in the world; for every RWO.

      You might consider this a conceit, or a folly, but that's the task they have set for themselves.

      The original Scientific American article that described the semantic web used as its example intelligent 'bots that would manage your daily calendar and make appointments for you. This was far short of the AI "ontology of everything" but the result that matters to us now is that there have been AI principles baked into the development of RDF, including the concept of the RWO.

      RWO isn't as mysterious as it may seem, and I can provide a simple example from our world. The MARC record for a book has the book as its RWO, and most of its data elements "speak" about the book. At the same time, we can say things about the MARC record, such as who originally created it, and who edited it last, and when. The book and the record are different things, different RWO's in an RDF view. That's not controversial, I would assume.

      Our difficulties arise because in the past we didn't have a machine-actionable way to distinguish between those two "things": the book and the record. Each MARC record got an identifier, which identified the record. We've never had identifiers for the thing the record describes (although the ISBN sometimes works this way). It has always been safe to assume that the record was about the book, and what identified the book was the information in the record. So we obviously have a real world object, but we didn't give it its own identifier - because humans could read the text of the record and understand what it meant (most of the time or some of the time).

      I'm not fully convinced that everything can be reduced to RWO/not-RWO, and so I'm not buying that is the only way to talk about our world and our data. It should be relatively easy, though, without getting into grand philosophical debates, to determine the difference between our metadata and the thing it describes. That "thing it describes" can be fuzzy in terms of the real world, such as when the spirit of Edgar Cayce speaks through a medium and writes a book. I don't want to have to discuss whether the spirit of Edgar Cayce is real or not. We can just say that "whoever authors the book is as real as it gets." So if we forget RWO in the RDF sense and just look sensibly at our data, I'm sure we can come to a practical agreement that allows both the metadata and the real world object to exist.

      That doesn't resolve the problem of identifiers, however, and for machine-processing purposes we do need separate identifiers for our descriptions and what we are describing.* That's the problem we need to solve, and while we may go back and forth a bit on the best solution, the problem is tractable without resorting to philosophical confabulations.

      * I think that the multi-level bibliographic descriptions like FRBR and BIBFRAME make this a bit more complex, but I haven't finished thinking about that, so will return if I have a clearer idea.