Tuesday, May 24, 2011

From MARC to Principled Metadata

Library of Congress has announced its intention to "review the bibliographic framework to better accommodate future needs." The translation of this into plain English is that they are (finally!) thinking about replacing the MARC format with something more modern. This is obviously something that desperately needs to be done.

I want to encourage LC and the entire library community to build its future bibliographic data on solid principles. Among these principles would be:

  • Use data, not text. Wherever possible, the stuff of bibliographic description should be computable data, not human-interpretable text. Any part of your metadata that cannot be used in machine algorithms is of limited utility in user services.
  • Give your things identifiers, not language tags. Identification allows you to share meaning without language barriers. Anything that has been identified can be displayed in language terms to users in any language of your (or the user's) choice.
  • Adopt mainstream metadata standards. This is not only for the data formats but also in terms of the data itself. If other metadata creators are using a particular standard language list or geographic names, use those same terms. If there are metadata elements for common things like colors or sizes or places or [whatever], use those. Work with international communities to extend metadata if necessary, but do not create library-specific versions.

There is much more to be said, and fortunately a great deal of it is being included in the report of the W3C Incubator Group on Library Linked Data. Although still in draft form you can see the current state of that group's recommendations, many of which address the transition that LC appears to be about to embark on. A version of the report for comments will be available later this summer.

The existence of this W3C group, however, is the proof of something very important that the Library of Congress must embrace: that bibliographic data is not solely of interest to libraries, and the future of library data should not be created as a library standard but as an information standard. This means that its development must include collaboration with the broader information community, and that collaboration will only be successful if libraries are willing to compromise in order to be part of the greater info-sphere. That's the biggest challenge we face.

