Tuesday, August 11, 2009


I was reminded by Jenn Riley's post on FRSAD that I hadn't yet read the document. Jenn had some interesting concerns about the model, and now that I have read it, so do I.

The main thing that bothers me is that the FRSAD's view of authority data appears to be that it names things, and by that I mean that it names things for the human reader. The introduction to FRSAD says:
The purpose of authority control is to ensure consistency in representing a value -- a name of a person, a place name, or a subject term -- in the elements used as access points in information retrieval.
The example given is that of World War II, which can be called by many different names in publications, but is brought together under a single heading in LCSH.

I think that the goal of authority control is to come up with a single representation for a concept or a thing. The nature of that representation is very important, however. By choosing the preferred display form as the representation of the entity your metadata has a fatal flaw: any change to the display form creates a different entity. A display form simply is not a viable persistent identifier. Using the display form also makes it much more difficult to share your data across languages and across contexts. "World War II" and "Seconde Guerre mondiale" are the same thing conceptually, but if only the names are used to identify the topic those two terms are far apart. It would be simple to bring them together, however, if the topic had a true identifier, one that is independent of the preferred display form.

I am a bit perplexed that no one on the FRSAD committee was able to introduce the concept of identifier into the project. It seems to be such an obvious answer. Each topical entity must have an identifier. That identifier remains the same regardless of decisions about display. The determination of a single display may still be required for certain user functions, but the big plus is that you can decide to display the authorized form in English or Spanish, for adults or children, in a transliterated form or vernacular, without changing the identity of your entity.

Without an identifier, there is no way to represent an entity as metadata. The Work and the Thema (FRSAD's word for subject) have no existence in metadata without a machine-readable identity that allows them to have being. This is a basic rule of the Semantic Web, but it has always been a fact of metadata usage in machine-readable form. Those of us in libraries have struggled to create systems and programs that attempt to control identities with user display forms, and it is both a frustrating and flawed approach. We need to move FRSAD from:


where the display forms are flexible and aren't involved with identifying what our metadata is about. Display forms are for humans; identifiers are for machines. Identifiers are also language neutral and can facilitate sharing across languages and communities. It's really that simple.

1 comment:

Anonymous said...

I think it's helpful to work from this point you made:
"The Work and the Thema (FRSAD's word for subject) have no existence in metadata without a machine-readable identity that allows them to have being."

I believe FRSAD is a conceptual model, and you're discussing the technological "carrier" for content. In FRSAD, the Identifier would really be another Nomen. A Nomen doesn't have to be a term in some natural language. The framers define it as "any sign or sequence of signs (alphanumeric characters, symbols, sound, etc.) by which a Thema is known, referred to, or addressed as" (p. 25). So actually, the Identifier would be a Nomen. Maybe you could call it the "super-nomen" in our future electronic systems. Those systems will map super-nomens to their various equivalents in natural languages.

Ted Gemberling
PS Shameless plug: I published an article on FRSAD in the June issue of C&CQ.