I understand the need that standards have to be very precise in their terminology; to give terms specific meaning. There often is a conflict, however, between that desire for precision and the need to communicate well with the users of the standard. An example of this is the OpenURL* standard, which pioneered the "Context object" and its ever-obscure children like the Referent and the Referring Entity. Quick: give me a definition for Referent.... right, it's not exactly on the tip of anyone's tongue.
I'm going to say that there are two kinds of people in the world: those who think that using a standard should require many hours of study leading to a complete understanding and absorption of the concepts and terminology, so that there cannot be any possible mis-use of the standard; and those who think that a standard should be fairly understandable in a single reading, and usable shortly thereafter. Members of the former group seem to feel that the ideas in their standard are so clever, so unique, that they cannot be comprehended easily. Members of the latter group (to which I obviously belong) assume that standards recombine previous concepts into new structures, and, deep down, are generally simple ideas that one could express simply.
Jeff Young, who clearly has an element of Type 2 in him, managed to unbundle the studied obscurity of the OpenURL with the opening post to his blog, Q6. He replaced the OpenURL terms with Who, What, Where, Why, When, How. I believe that for many people, the light bulb suddenly lit up upon reading his explanation.
A similar simplification is needed for the Dublin Core Abstract Model, and I'm going to attempt that even though I think it's a dangerous thing to do. DCAM defines a set of metadata types that can help us communicate to each other about our metadata. It should simplify crosswalking of metadata sets, and make standards more understandable across communities. Unfortunately, it has not done so, at least in part, because of some rather demented semantics.
First, you need to understand that the DCAM is about metadata structure, not meaning, or at least not meaning in the human sense of the term. It describes a generalized underlying format for machine-readable metadata. In the most simple terms it provides the information that a program would need to determine what operations it can perform on the metadata that it receives. In this sense, it is a general metadata analogy to the OpenURL's Context Object: a formalized message about something.
The basis of the DCAM is key/value pairs, each of which is called a statement, which is the terminology from RDF. Any group of statements describe a single resource. A resource can be just about anything, but they are based on what your metadata ultimately describes. Examples are: a book; a person; an event. The set of key/value pairs that describes a resource is called a description. If you will describe more than one resource in your metadata record, then you will have multiple descriptions. These make up the description set, which is the sum of descriptions that you have defined for your purpose. These descriptions can be packaged into a record. It all looks something like this:
The statement level is where we get to the real meat of the DCAM, and the part that I think holds great potential. The actual DCAM diagram is very large and filled with terminology that makes it difficult to grasp the meaning of the concepts. I'm going to simplify that meaning here, with the understanding that there is more to DCAM than this simple explanation. Consider this Step 1, not the whole enchilada.
Essentially you have key/value pairs. A key/value pair can look something like:
title = Moby Dickwhere "title" is the key and "Moby Dick" is the value.
The first rule in the DCAM is that the key must be identified with a URI, a Uniform Resource Identifier. Here's a URI that you might use for this key/value pair:
There is nothing new in this; using URIs is a very common convention. It's in the definition of the value that DCAM adds something. Values can be "literals" or not. DCAM makes use of the RDF definition of a literal:
"Literals are used to identify values such as numbers and dates by means of a lexical representation. Anything represented by a literal could also be represented by a URI, but it is often more convenient or intuitive to use literals."
Literals can be plain or typed, as defined in the RDF documentation. An example of a typed literal is a date or currency. A typed literal gives you some control over the format of the string, such as "YYYYMMDD."
This is contrasted with the non-literal values. The non-literal values are not defined in the RDF documentation, except to imply that they are everything that is not a literal. The DCAM goes further and defines non-literals as being of two types: the non-literal value is either a URI that names the value or it is a value from a named, controlled vocabulary. So you can have:
which is a URI for a controlled value, in this case the DCMI type value, Event. Or you can have:
[URI for ISO 3166-2, country codes] + "en" for EnglishThis latter is similar to what we often do in MARC records, which is to record a code as a string and indicate what controlled list that string is from.
Obviously, if this was all there was to DCAM, we'd all be all over it by now. What happens next is that we start trying to apply this in a metadata world that is not as neat as we would like. For example, what do we do with an ISBN -- is it a structured value? yes. Is it a member of a controlled vocabulary? sometimes, yes, because there is a finite list of ISBNs and each one of them is known (at least to Bowker and other ISBN agencies). So, is it a typed literal, or a nonliteral?
In the end, however, perhaps it doesn't matter that this definition of "nonliteral" leaves us with some ambiguity. Perhaps what really matters is that we distinguish between these three kinds of values:
- Plain strings. These will be necessary for values that simply cannot be controlled or are not being controlled.
- Structured strings. In these, the range of values can be great, and is not being housed in a finite list, but because of their structure they can often be acted on for functions like quality control, transformations, etc.
- Identified values. An identified value is the essence of the semantic web. It allows algorithms to make connections between values even though those algorithms are not machine intelligences, are not understanding the human meaning behind the value.
I welcome discussion, criticism, additions... whatever comes to your mind. Really.
* NOTE: I'd give you a link to the OpenURL standard, but NISO has gone with one of those content management systems that produce unusably long links. So you'll have to go to the NISO site and look for it under Standards.