Friday, March 23, 2007

There's always something new

I was looking over the impressively improved RDA Part A Chapter 3. I still have some issues (natch!), but it's clear that there has been a lot of thought about data elements and how they differ from simply strings of text.

One of the areas that needs to be thought through is how RDA will be able to change over time, something that is often called "extensibility" of a data standard. I was looking, for example, at the carrier types (3.3.0.2.2), trying to think of what future technology might not be covered. Then I ran into it at the office supply store, where I was replenishing my store of brightly colored sticky notes: you can now purchase tax software on a thumb drive.

(Note: of the carriers listed in RDA A/3 there is something called a "computer chip cartridge". This could be the term that would be used for the thumb drive, but the only examples I can find relate to Nintendo cartridges. So I'm going to pretend, for the purposes of this discussion, that thumb drives aren't covered. Even if they are, something else new will come along, and probably soon.)

RDA has a list of carriers, broken up into large categories like "audio carriers" and "computer carriers." If the item you have in hand isn't on the list, then you are instructed to use "other audio carrier," "other computer carrier," etc. Which means that anything really new will end up in the "other" category, which isn't terribly useful. It also means that something that gets coded as "other" today will have to be updated when the list catches up, but your search on "other computer carrier" will bring up a list of items that may be very different from each other. So there needs to be a way that such lists can be extended quickly, even in a provisional way, to keep up with this fast-changing world. There also needs to be a way that people in the field can find out that the list has been updated.

There are many different ways that you can develop extensibility for a set of data elements. The main thing is that you want the newly minted term to have a clear context (what list does it belong to?), and you want to be able to get people to the definition of the term when they encounter it. In this case, the context is that it is a carrier of information, and it is specifically a new kind of computer carrier. It is also extending an existing list, say, the RDA carrier list.

Let's pretend that we have a registry of terms. And let's pretend that the registry has some management mechanism, such as a small group of participants that oversees the various lists in the registry (so it's not total anarchy). Our thumb drive could be added such that:

http://authoritylists.info/RDA:carrier:computer_carrier:USB_flash_drive

returns this information in a machine-readable format:

owner: RDA
list: carrier
sublist: computer carrier
element: USB flash drive
status: provisional
date added: 2007-03-30
description: "USB flash drives are NAND-type flash memory data storage devices integrated with a USB (universal serial bus) interface." (quotes because I took that from wikipedia, but generally the expert adding the term would write a suitable description.)
synonyms: thumb drive, jump drive, flash drive

The many products based on RDA would make use of the registry to support the creation of new records and the reading of existing records. With some periodicity, these systems would check that their lists are up to date (like the automatic update of virus lists in your anti-virus software). Such a system could decide that provisional entries would be flagged in some way (maybe they would show up as red on the screen). Or a system receiving a record with a previously unknown item in an authority list could quickly grab the description from the registry and use that to provide services, like definitions and synonyms, to its users.

OK, I'm sure that there are geekier folks out there who could (and hopefully will) point out what parts of this don't work, but I'm mainly interested in exploring the general concept: can we get away from text lists and create something that is dynamic, machine-actionable, and useful?

3 comments:

Diane Hillmann said...

Karen, this is exactly the right approach, and one that we, as librarians, should understand and support. I've mocked up the vocabulary in the NSDL Registry Sandbox (which not incidentally is a project of mine), and it can be viewed at: http://sandbox.metadataregistry.org/vocabulary/show/id/44.html --Click on 'Concepts' to see the whole list, and 'Details' once you've found the concept you're looking for.

What a solution like this does is enable: definitions, relationships between terms, lead in terms,URIs--the whole host of things that we rely on for our "major" vocabulary efforts but need just as much for the everyday categorizations that drive our systems.

Extensibility using this strategy is quite simple. If a particular community or project needs additional or more granular terms, they can create a term list that adds or extends these terms. They can assert relationships with the original list, deprecate certain terms in favor of others for their use, etc. So long as their list is registered and available for reuse by others using a mechanism like this, it can be easily substituted in their data, and it's easily interpreted by all. Anyone can choose to use either the original list, the specialized list, or both (depending on the repeatability of the element) and so long as we use the URIs for identification, there's no ambiguity from the machine's point of view. From a human point of view, there's a readable label, and (one hopes) a definition.

I'd love it if people would send me some additional suggestions for terms, alternate labels for the ones here, and DEFINITIONS! I'd be happy to add them and we could use this prototype for discussion.

Please note that the Sandbox is open to anyone, and folks are free to try the same with other RDA vocabularies: http://sandbox.metadataregistry.org

Let's make this a movement ... ;-)

Diane Hillmann (the other troublemaker)

Karen Coyle said...

Diane's URL got cut off. I'm putting it here on two lines that you need to put together:

http://sandbox.metadataregistry.org/vocabulary/
show/id/44.html

Or you can go to http://sandbox.metadataregistry.org/ and click on vocabularies on the right hand side. It's listed under RDA.

Anonymous said...

Great post. You should forward this to the RDA list.

Jonathan Rochkind