Monday, September 08, 2008

Metadata Affordances

In my last post, I promised to spend some time thinking about metadata affordances -- that is, a view of metadata based on what you can do with it. My hope is that this will inform a metadata model that serves our needs (whoever "we" are, but admittedly this will tend toward the metadata needs of the library community). Here are the categories that I have come up with, all open to comment, discussion, correction, etc., so please comment freely.

None (opaque text)

Some metadata will necessarily be of this category, with no particular affordances inherent in the contents. At times plain text is used because that is the nature of the particular metadata element, like the recording of the first paragraph of a text, or transcribing a title from the piece. At other times plain text is used because the metadata community has chosen not to exercise control over the particular metadata element. An example of this is user-input tags. Although human intelligence may be applied to plain text fields, it requires knowledge that is not inherent in the metadata structure itself.

Structure and rules (typed strings)

Typed strings are things like formatted dates (YYYYMMDD) and currency formats ($9,999.99). There are other possible formatted strings, such as the common identifiers like ISBN and ISSN. The affordances of these strings is that you can exercise control over the input of them, forcing the consistency of the values. With consistent values you can perform accurate operations, like adding up a set of figures, sorting or searching by date, etc. Some controlled list values may also have structure: the standard format for personal names used by libraries includes structural rules ("family name followed by comma, then forenames") that facilitates the use of alphabetically ordered lists of names.

List membership/vocabulary control

One way to assure consistency in metadata is to require that the metadata value be selected from a fixed list of values, rather than being open to free text. This tends to take the form of a list of like terms: languages of text, country names, colors, physical formats.

Although it provides consistency, list membership alone does not provide much in terms of capabilities for data processing. Other information is needed to provide affordances for list members:
  • access to display and indexing forms of the term
  • access to alternate forms, including other languages
  • access to definitions of terms

The information that is needed, therefore, for any list and its members is:
  • list identifier
  • member identifier
  • location of services relating to this list/member, and what services are available

If there are no automated services, then a system will need to provide its own, which is what we generally do today by creating a copy of the list within the system and serving display forms and other features from that internal list. In a web-enabled environment, however, one could imagine lists with web services interfaces that can be queried as needed.

Inter- and intra-metadata links

There is a need to create functional links within metadata segments to other metadata segments or records. For example, the use of name and subject authority records implies a link between those records and the bibliographic metadata records that contain the names and subjects as values. There are also links needed between bibliographic records themselves. These latter represent a number of different relationships, which have been articulated in the FRBR documentation. Some examples are: work-work relationships, work-expression relationships, and part-whole relationships (chapters within books, articles within journals).

There may be other kinds of links that are needed as well, but I think that the main need is to distinguish between identifiers and links. Some identifiers, like ISBNs, can be used to retrieve metadata in a variety of situations, but those should be seen as searches, not links. Searching is appropriate in some circumstances, but the ability to create stable links is a separate affordance and should be treated as such.

Note: These categories of affordances are not mutually exclusive. Some metadata values will provide more than one type of affordance. Each should be clearly and separately articulated, however, and we should think about the advantages and disadvantages of having metadata values serve multiple functions.

No comments: