Wednesday, September 17, 2008

Functional Requirements for App Profiles

In preparation for DC2008 (9/22-26, Berlin), I've been thinking about application profiles. The DC folks have developed a structure for application profiles which I have attempted to use for the DC-RDA work. I ran into some difficulties, in part because the library community has its own particular needs. So I thought it would be a good idea to articulate these needs in preparation for discussions I hope we will have next week.

I'm going to use some terminology from FRBR, and some from the DC work. Mainly, I'll use "entity" in the FRBR sense. I'll use "property" in the sense that it is used in the DCAM and in RDF.

Here's my first pass at what we need to express in an application profile for the library community:

entities


We need to define the entities that will be in our metadata environment. It would be ideal to be able to re-use entities where possible. So if two APs can use the same Person entity, they just need to be able to identify it. At the same time, it must be possible to create a different person entity and to give it a new identifier.

relationships between entities

It some cases it will be desirable to constrain the relationships that can exist between entities. Both RDA and FRBR constrain which Group 2 entities have relationships with a Work as opposed to an Expression. This is an area of some disagreement among sub-communities, so there will be some APs that will define the relationships differently.

properties of entities

Entities have properties. These are metadata elements that have been defined outside of the AP. Each property must have a unique identifier. It is the detailed information about the properties that will make up the bulk of the AP. Here is a first list of what that information needs to be:
  • property identifier
  • property is mandatory/option
  • property is repeatable/not (within entity)
  • properties are cumulative/mutually exclusive -- a way to say that you can use property A or B or C, or that you can use any combination of A, B, C.
  • property value is controlled/uncontrolled -- this distinguishes between free text (e.g. an abstract, user tags) and a constrained set of values (authority list, or a designated format). If controlled, then there needs to be a way to give some information on the type of control: URI of a list of terms; URI of a standard format for the data (e.g. date type format, or AACR2 name heading format).
  • property value is transcribed/supplied -- transcribed data are taken directly from the resource itself; supplied means source of the information is not the resource. (Title can be transcribed; subject headings are supplied.)
  • for controlled property values that use a set list of values, it has to be possible to state the vocabularies that are valid, and whether or not they are mandatory or optional. It may also be necessary to define whether one can extend the vocabulary in the metadata (e.g. use an unlisted value if a new value is needed). It needs to be stated whether the entire vocabulary is to be used. If not, the AP needs to define which values from the full vocabulary are valid. It also needs to be possible to create a list of values within the AP for any element. In this case there is no external controlled list.
Other?

I'm musing over whether we need to be able to define a "record," mainly to say what the minimum is that someone could expect to receive.

I'm also considering the need to define relationships between records -- like the FRBR work/work and work/part relationships. As I said in my post on linking, I see a difference between dependent and independent links, and these, in my mind, would be independent links, and may point beyond a particular database or system. I'll think more about this, and welcome comments.

2 comments:

Anonymous said...

Do you envision the elimination of or a replacement for notes that indicate the source of transcribed values like titles? For example, there are the notes "Title from cover" or "Cover title," which I think are a waste of time and characters. I think the same way about brackets around cataloger supplied information. I am not so much against the idea of it as I am against the the way that it's handled. This is the kind of thing that could be indicated somewhere else behind the scenes in a more formal way, perhaps like the second indicator value in 246, if we think in MARC terms, or maybe even not at all. I say save those characters for a controlled value where it makes more of a difference. What do you think?

Karen Coyle said...

I hadn't gotten to the point of thinking about the notes relating to transcribed values, but I think you are right. My main concern is that there is nothing in a machine-readable format that tells you what elements are transcribed and what elements are "provided." If we think that transcription is a particular "privileged" type of data, then we need to code it as such for machine processing. In such an environment, I can imagine that there could still be reasons why a cataloger would want to indicate where the transcribed title came from as information to other catalogers (most users will not care) -- but that would be information for fellow (human) catalogers, not for any machine operations. Those notes and the brackets are not suitable ways to encode aspects of the data in a machine-actionable world, so if we can find a better way to convey meaning we should use it.

I agree with your statement: "I am not sure much against the idea of it as I am against the way that it's handled." It's a matter of finding a better way to carry some of the messages that we have in traditional catalog data, and to encode them in ways that we make make more use of them in computer systems.