The declaration by Library of Congress that the time has
come to make the long overdue change to a new data format has rocked the
library world. A common reaction is: "How can we do that, when we have 1)
thousands of library systems that are designed primarily to
work with MARC records 2) no money to pay for a major change, and 3) have no
clear idea what we should be heading towards." This fear is increased by
the fact that so far there is little public evidence of activity on this
project.
A change of this nature is huge. It's not quite Europe
converting to the Euro, but within the library world it is a change of the
magnitude of converting the Internet from IPv4 to IPv6. We've made other big
changes in the past, in particular the change from the card catalog to the OPAC. That effort required us to purchase new systems and to convert the whole
of the printed card catalog to the MARC format. Amazingly, it took only about
decade to complete that transition.
However, here is the key difference: that change was entirely internal to
the library community. This next one has an additional complication brought on
by the fact that the target environment for the future of library data is the
Web. This new framework will need to be integrated with that massively complex
environment of networked information. This adds unknowns ("How will Web
users interact with library data?"), but it also affords some
possibilities that we didn't have with the change to MARC. Mainly, it allows us
to make use of existing Web technology and the Web community for help in both designing
and implementing the change.
What this means to me is that this is not a
"library-only" activity that we are embarking on. Unknown numbers of
users and systems will want or need to make use of library data on the Web. At
least we hope so. Right now, Web services in need of bibliographic data often point
to Amazon. Others rely on "crowd-sourced" solutions like Mendeley or
Zotero. What will make library data most useful and usable to the larger
community? This isn't a question we should be asking of libraries, but of
potential users of the data.
There are other important questions we should be thinking
about. How will we test whether the new framework is well designed for system
functionality and efficiency? How will we convert from what we have to this new
framework? What structures must we put in place to maintain and extend the
framework over time?
It seems very unlikely that the Library of Congress can
address all of this in the 18-month period that is allotted for this work
(of which perhaps 12 months remain) because: a) their focus is understandably primarily on the needs of their
organization and b) this effort, to be successful, must have input from
organizations that are external to the library community. That’s a very tall
order for short time span.
None of this should be taken to imply that LC doesn't have
smart, skilled staff to work on this -- they do. But if you've ever taken on a
large project in your institution you know that the staff working on the
project is also doing much of the day-to-day work that occupied them before the
project was begun. Few are able to dedicate 100% of their time to a new effort.
The question therefore becomes: How can a larger community help LC with this
project, taking on appropriate task areas?
I have in mind a set of tasks that could be worked on in
parallel, by a number of different interested constituencies and with some good
coordination. More details are needed, but the big picture is something like this:
The Web track is an obvious one, especially given that the
W3C has already shown an interest in facilitating the entry of library data onto
the Semantic Web. There is also a growing realization in the library community that
we need to fairly quickly begin to build on the foundation standards developed
by the W3C Semantic Web activity. There appears to be a similar awareness in the Semantic Web
community that library data presents interesting challenges. For example, library
data has revealed that an approach to authority data is needed that cannot currently
be provided by SKOS (Simple Knowledge Organization System). Discussions on lists
that focus on the Semantic Web make it clear that our early efforts in defining
library data in RDF are helping to inform the thinking of the experts in linked
data creation.
The bibliographic description track is of course the key one
for libraries. This to me is the solid ground of LC, along with its community
partners: to determine the semantics of the data that libraries will use to
describe their resources and to provide access for users. RDA already does a
great deal of this but the task ahead is to make sure that one can express
those concepts in a new data format. There should also be an analysis of the
cataloging workflow and even of the expected functionality of a cataloging
interface. The requirements arising out of this track will inform the work of
the Web and IT tracks as they help the Library convert these requirements to
implementable structures and applications.
The IT track is absolutely essential: How do we assure that
we have data structures that work well with the entire gamut of library systems
functions, from acquisitions through circulation? One question I have in
particular is about the efficiency of a large bibliographic database structured
around the FRBR entities. Efficiency must consider more than just the creation
of the bibliographic data, it also must be efficient for the retrieval and
display of that data. The report on the Future of Bibliographic Control
recommended testing of FRBR. RDA has served to test many of the FRBR concepts,
but as yet there is no proof of concept of a data structure that uses the FRBR
entities. This track, as I see it, would involve library systems vendors as
well as some computer professionals who work with "big data" and
semantic web technologies.
The management track is very important but in LC's plan it
might be relegated to a later phase since it appears to deal mainly with future
activities such as maintenance and modification of the standard. This, however,
would be a mistake because the standards for maintenance and extension must be
in place from the very beginning of the new framework. I would even say that
the new framework should be developed from the beginning with a core and
extensions. This eliminates the need to have on opening day a standard that is
"everything for everybody," and could allow for a phased
implementation of the framework. Note that there are some immediate issues in
RDA that require a maintenance standard, such as how to handle open-ended
controlled lists in a way that would be compatible with Semantic Web standards.
A critical part of this is the coordination between all of
these activities. Such a role, however, is not unusual in a large IT project
where work is spread across groups with intersecting milestones.
It seems to me that a division of
this nature (and not necessarily exactly how I have described it here) would
relieve LC of some of the work that it is undoubtedly currently considering
taking on; it could increase the speed with which the full design could be
completed; and in my opinion it has the possibility to produce a higher quality
solution than could be achieved by a single organization. Logical participants
include NISO (both in its role as the standards body that manages the MARC
standard and in its role as a focus for the library technology community), W3C's Semantic Web community, Dublin Core Metadata Initiative (which is working on standards for application profiles in RDF), and IFLA (which now has a
Semantic Web interest group). There is also some possible synergy with projects
like the Internet Archive, the Digital Public Library of America, schema. org, and
the Zotero community. Clearly funding would be needed, and that's also not a simple task.
My concern is that if we don't organize ourselves in this
way, that come January, 2013 we will not be anywhere near having the ability to
create bibliographic data in a new framework. RDA will be implemented
inadequately in MARC and, as that solution is the path of least resistance,
work to create a new framework will slow to a crawl. If we don't step up to
this task, for many years to come we will continue to see library data housed
in frameworks and silos that are invisible to most information seekers. That would indeed be very unfortunate.
Note: Planned session for ELAG2012 to be led by Lukas Koster with a very similar approach, and with the intention of delivering ideas to LC for the new framework.
Karen,
ReplyDeleteGood overview of things to think of. Thaks for mentioning my ELAG 2012 workshop. I'll use this post as input for that.
One other thing: I'm also involved in a new Linked Open Data Spcial Interest Working Group in IGeLU, the International Group of Ex Libris Users, trying to work with Ex Libris to add LOD features to their system, including new BibFramework stuff. See our Manifesto
Karen,
ReplyDeleteThis is a great post. It would be in everyone's best interests if libraries can help. One way to sell this to administrators at a given library would be to point out that while helping LC, you're also building local expertise and ensuring that you hear about new developments before they get a chance to bite you.
Is LC open to this approach? They mention collaboration in their press release, but it sounds pretty unstructured, and focused on "give us your feedback" rather than "help us get this done."
<>
ReplyDeleteKaren
could you give us some details on that?
SKOS is today widely used in the libraries field for thesauri, authority data and for terminology registries (like metadataregistry.org) ...
gb - I think you are asking for information about the problems with using SKOS. LCSH turns out to be a prime example -- LCSH is a pre-coordinated subject list in which subject terms are combined with facets. It goes something like:
ReplyDeletetopic -- geographic division -- time division -- genre division
There isn't a way in SKOS to indicate the facets and their meanings, so LCSH terms are presented as simple strings, e.g.
Arabic literature--Palestine
Bulgarian literature--To 1762
Religious literature--Distribution
losing the typed meaning of the facets. In the library machine-readable record, each facet type is coded differently so you know if you have a subtopic, a geographic facet, etc.