Friday, April 06, 2012

If not RDF, then what?

There's no question that the data format known as RDF is darned difficult. Let's suppose that we in the library world decide not to hitch our wagon to RDF, but would still like to create a new bibliographic framework. After all, if MARC simply won't work for the creation of RDA records, we still need something besides MARC that we can use to create data. And even if (although this is unlikely) we should decide not to move to RDA, our records still need some upgrading to fit better into current data processing models. We still need to:
  • define our entities
  • use data wherever possible, not text
  • use identifiers for things
  • relate attributes to entities (that is, say things about some thing)
  • use a mainstream serialization

Should we do this, the mainstream serialization could be anything from JSON to XML to RDF. In fact, it could be all of those if we play our cards right and define our data in a format neutral way. RDA does some of this for us, but not all. In particular, RDA does not distinguish between data and text, and although it allows for the use of identifiers it doesn't give any guidance on how to use them. RDA is probably fine as guidance rules for decision-making, but it needs the corresponding data definition before it becomes useful. Having that data definition could help to clarify some ambiguities in RDA. We have to expect that there will need to be some iteration between RDA and a data definition. (I will post shortly on a problem that I have run into.)

It also seems to me that we have everything to gain by beginning our work on a data format with no particular serialization in mind. We could go from RDA to RDA-as-data and then on to RDA-as-RDF. I see some dangers in skipping the middle step, mainly that we could end up making some decisions that fit RDA into RDF but that are problematic for other serializations.

4 comments:

Stephen Bounds said...

Hi Karen,

You seem to be referring to RDF/XML here rather than RDF specifically, since RDF is just a model without any specific representation (or schema restrictions, for that matter).

Have you looked at any of the alternate coding schemes, such as Notation 3 or RDFa?

Karen Coyle said...

No, I do mean RDF although I may not have worded it well. I believe I mean what TimBL means when he has one of the stars be "Use RDF standards." I interpret "Use RDF standards" to mean that you define your properties using RDF (or OWL). It is my observation that the RDF standards are still in development and that there are still many questions relating to how you do things. Therefore it might be better to follow some general best practices that would translate easily to RDF in the future.

Ben Companjen said...

Hi Karen,

Seeing that ontologies are already being published in the Open Metadata Registry, I think chances that RDF will not be used are small.

On the other hand, having abstract models (entity definition) that can be translated to RDF models and other model+formats makes for possibly better interoperability.
The RDF standards are not in development anymore, as far as I know, but the RDA vocabularies in RDF and best practices for e.g. provenance are, so having something that supports all goals of RDA seems useful.

Karen Coyle said...

Ben,

although RDA has been published in RDF in the Open Metadata Registry, LC did not participate in this version of RDF and it has not been acknowledged by LC. LC also has not reached out to the OMR to participate in the work on the new framework, to my knowledge. Sadly, I fully expect LC to create its own version of RDA in RDF.

As for the development of RDF, although the underlying standard appears to be stable, the standard suite seems to me to still be in progress. OWL, provenance, named graphs... are all being actively worked on, and I suspect that there will be many more standards added to the suite in the future. The Library of Congress has already had some difficulty applying SKOS to its data and resorted to creating its own RDF-based standard for KO data, called MADS in RDF. Other attempts to define library data in RDF (FRBR, ISBD) have revealed difficulties in applying the standards. In part this is because library data has inherent some assumptions that are very "closed world" and hard to translate to the more open world of RDF. I believe that there is an error being made, which is wishing to re-create library data in RDF without making any changes, and therefore without acknowledging that there are some deep contradictions between the library metadata tradition and the RDF approach.