Thursday, August 25, 2011

Bibliographic Framework Transition Initiative

The Internet began as a U.S.-sponsored technology initiative that went global while still under U.S. government control. The transition of the Internet to a world-wide communication facility is essentially complete, and few would argue that U.S. control of key aspects of the network is appropriate today. It is, however, hard for those once in control to give it up, and we see that in ICANN, the body charged with making decisions about the name and numbering system that is key to Internet functioning. ICANN is under criticism from a number of quarters for continuing to be U.S.-centric in its decision-making. Letting go is hard, and being truly international is a huge challenge.

I see a parallel here with Library of Congress and MARC. While there is no question that MARC was originally developed by the Library of Congress, and has been maintained by that body for over 40 years, it is equally true that the format is now used throughout the world and in ways never anticipated by its original developers. Yet LC retains a certain ownership of the format, in spite of its now global nature, and it is surely time for that control to pass to a more representative body.

Some Background

MARC began in the mid-1960's as an LC project at a time when the flow of bibliographic data was from LC to U.S. libraries in the form of card sets. MARC happened at a key point in time when some U.S. libraries were themselves thinking of making use of bibliographic data in a machine-readable form. It was the right idea at the right time.

In the following years numerous libraries throughout the world adopted MARC or adapted MARC to their own needs. By 1977 there had been so much diverse development in this area that libraries used the organizing capabilities of IFLA to create a unified standard called UNIMARC. Other versions of the machine-readable format continued to be created, however.

The tower of Babel that MARC spawned originally has now begun to consolidate around the latest version of the MARC format, MARC21. The reasons for this are multifold. First there are economic reasons: library vendor systems have been having to support this cacophony of data formats now for decades, which increases their costs and decreases their efficiency. Having more libraries on a single standard means that the vendor has fewer different code bases to develop and maintain. The second reason is the increased amount of sharing of metadata between libraries. It is much easier to exchange bibliographic data between institutions using the same data format.

Today, MARC records, or at least MARC-like records, abound in the library sphere, and pass from one library system to another like packets over the Internet. OCLC has a database that consists of about 200 million records that are in MARC format, with data received from some 70,000 libraries, admittedly not all of which use MARC in their own systems. The Library of Congress has contributed approximately 12 million of those.  Within the U.S. the various cooperative cataloging programs  have distributed the effort of original cataloging among hundreds of institutions. Many national libraries freely exchange their data with their cohorts in other countries as a way to reduce cataloging costs for everyone. The directional flow of bibliographic data is no longer from LC to other libraries, but is a many-to-many web of data creation and exchange.

Yet, much like ICANN and the Internet, LC remains as the controlling agency over the MARC standard. The MARC Advisory Committee, which oversees changes to the format, has grown and has added members from Libraries and Archives Canada, The British Library, and Deutsche National Bibliothek. However, the standard is still primarily maintained by and issued by LC.

Bibliographic Framework Transition Initiative

LC recently announced the Bibliographic Framework Transition initiative to "determine a transition path for the MARC21 exchange format."
"This work will be carried out in consultation with the format's formal partners -- Library and Archives Canada and the British Library -- and informal partners -- the Deutsche Nationalbibliothek and other national libraries, the agencies that provide library services and products, the many MARC user institutions, and the MARC advisory committees such as the MARBI committee of ALA, the Canadian Committee on MARC, and the BIC Bibliographic Standards Group in the UK."
In September we should see the issuance of their 18-month plan.

Not included in LC's plan as announced are the publishers, whose data should feed into library systems and does feed into bibliographic systems like online bookstores. Archives and museums create metadata that could and should interact well with library data, and they should be included in this effort. Also not included are the academic users of bibliographic data, users who are so frustrated with library data that they have developed numerous standards of their own, such as BIBO, the Bibliographic Ontology, BIBJson, a JSON format for bibliographic data, and Fabio, the FRBR-Aligned Bibliographic Ontology. Nor are there representatives of online sites like Wikipedia and Google Books, which have an interest in using bibliographic data as well as a willingness to link back to libraries where that is possible. Media organizations, like the BBC and the U. S. public broadcasting community, have developed metadata for their video and sound resources, many of which find their way into library collections. And I almost forgot: library systems vendors. Although there is some representation on the MARC Advisory Committee, they need to have a strong voice given their level of experience with library data and their knowledge of the costs and affordances.

Issues and Concerns

There is one group in particular that is missing from the LC project as announced: information technology (IT) professionals. In normal IT development the users do not design their own system. A small group of technical experts design the system structure, including the metadata schema, based on requirements derived from a study of the users' needs. This is exactly how the original MARC format was developed: LC hired a computer scientist  to study the library's needs and develop a data format for their cataloging. We were all extremely fortunate that LC hired someone who was attentive and brilliant. The format was developed in a short period of time, underwent testing and cost analysis, and was integrated with work flows.

It is obvious to me that standards for bibliographic data exchange should not be designed by a single constituency, and should definitely not be led by a small number of institutions that have their own interests to defend. The consultation with other similar institutions is not enough to make this a truly open effort. While there may be some element of not wanting to give up control of this key standard, it also is not obvious to whom LC could turn to take on this task. LC is to be commended for committing to this effort, which will be huge and undoubtedly costly. But this solution is imperfect, at best, and at worst could result in a data standard that does not benefit the many users of bibliographic information.

The next data carrier for libraries needs to be developed as a truly open effort. It should be led by a neutral organization (possibly ad hoc) that can bring together  the wide range of interested parties and make sure that all voices are heard. Technical development should be done by computer professionals with expertise in metadata design. The resulting system should be rigorous yet flexible enough to allow growth and specialization. Libraries would determine the content of their metadata, but ongoing technical oversight would prevent the introduction of implementation errors such as those that have plagued the MARC format as it has evolved. And all users of bibliographic data would have the capability of metadata exchange with libraries.


Orangeaurochs said...

And I think a similar inclusiveness of more parties, including the wider metadata community and publishers, as well as greater openness, would have done wonders for RDA too.

jm said...

I'm glad to see you raising these governance issues. I'm not as concerned as you are regarding the presence of IT professionals within the bibframe effort. All of the libraries mentioned in the announcement are fully stocked with extremely competent IT staff. I suppose that's not a guarantee that the IT folks will be involved in the effort, but I can't imagine the institutions participating being so foolish as to not include some people who are up to speed on what's technologically feasible. And I'd rather rely on tech people who are involved in libraries, honestly. I've always advocated for participatory design to the students in my metadata class; standards work better when those who have to live with the result every day drive the design process. And participatory design is only possible with IT developers who really know the environment in which the technology will be used.

I'm with you on trying to get stakeholders involved who aren't librarians, but as someone who's been involved in a couple of standards efforts, trying to achieve anything like true representation of separate, large and diverse communities in standards design is hellishly difficult. And to the extent you succeed, you will inevitably slow down the design process, and if the communities don't agree, you can kill it outright. So, yes to getting non-librarians involved, but we also should be maintaining a healthy skepticism towards our own ability to be fully inclusive of people who'll be influenced by the results of this process.

My big worry here is the long-term institutional home for maintenance of the results of this effort. We've watched many, many new standards efforts struggle badly with this over the past decade. METS, EAD, OAI and others all faced the problem that it's not that difficult to pull people together to create a standard, but pulling together resources on a permanent basis to provide for its further maintenance and development is incredibly difficult. There's a reason LC has ended up as maintenance agency for so many metadata standards - no one else had both the legitimacy in the community's eyes and the cash to pull it off. So, as you correctly point out, having LC in control of MARC has seriously problematic aspects, but there's no other organization out there up to the task at the moment that's any less problematic, and creating a new organization that has the representativeness we'd like to see is going to involve getting many existing organizations to fork over resources they'd probably rather see stay in their own coffers. Tricky at the best of times, and with budgets the way they are now, it certainly isn't the best of times.

Karen Coyle said...

jm - I agree with your worries about maintenance, but I am optimistic that a standard as important to libraries as this will result in a solution. METS, EAD and OAI have smaller constituencies, and that makes finding a stable host more difficult.

As for your concern about getting everyone together and agreeing, that's the topic of my next post, so hang on for that.

There are two things I want to say about IT folks and libraries. The first is in response to your statement: "...I can't imagine the institutions participating being so foolish as to not include some people who are up to speed on what's technologically feasible." Unfortunately, that is exactly what happened with RDA, and we now have a new cataloging code with no way to create the data. What were they thinking?!

The other thing is that although libraries have IT folks, there aren't many of them with a broad range of experience outside of the library environment. Many have learned on the job and only know library processes. Many have worked only in libraries, and libraries rarely have the $$ to be working on cutting edge technologies. If there are folks with both broad IT experience and library knowledge, they will be ideal for this task. Even so, I would really like to see some input from folks with a totally different point of view. I think that could keep us from staying in the deep ruts we have worn over the centuries of library data creation.

Renée Register said...

Great post! Having worked with metadata across library and publishers environments for many years, I'm convinced that we must have collaboration across all the communities that rely on metadata to describe and provide access to content.

In the digital world metadata really is the message. The ability to easily share, remix, and reuse metadata created across multiple communities of dedicated professionals creates remarkable efficiencies and cost-savings for all parties.

To a large extent, libraries missed the boat beginning with the rise of online access to content during the '90's. As metadata became more and more important in the wider web-based marketplace, library metadata formats and standards kept library data in silos. This continues to restrict the exposure and use of library data in the larger web environment and makes it very difficult for libraries to take advantage of metadata created outside the silo.

There is tremendous duplication of effort across library and publisher supply chain metadata activities. This inflates the cost of metadata creation and maintenance at a time when neither community can afford it.

Let's keep working to bring multiple minds and perspectives to bear on this. Libraries can't afford to miss the boat again. We should move (and quickly) toward a carrier that is flexible, interoperable, and that leverages multiple types of metadata for the benefit of all.

Diane Hillmann said...

I certainly agree that developing RDA starting with the guidelines for content creation was probably a terrible idea, but that's hindsight speaking. Despite that, I think we do have a start on data creation with the RDA Vocabularies. What we've most missed, certainly, is the tool development that would have helped us bridge the gap to useful data creation, but the delay in 'publishing' the vocabularies has clearly been to blame for much of that.

I suppose what I'm most interested in seeing in the LC announcements to come is how they see the development of this 'infrastructure', either as 'theirs' or 'ours' (defined as developed with participation and 'ownership' by the larger community).

My hope is that these plans will include a push towards more experimentation and tool building--something that has been in short supply up til now. I'm not so convinced that we need much consensus to move forward--as you note, there wasn't much for MARC either, at the beginning, and there's unlikely to be much now. From my point of view, the only important consensus is that which is necessary for data that can be created, managed, and shared in a variety of ways, but with underpinnings that allow consuming systems and services to understand and use the data in useful and innovative ways.

Tess said...

After reading your blog article i thought, "Maybe I should drop my cataloging course!"