Thursday, July 12, 2007

FoBC Meeting 3, Detailed Notes

Speaker: Deanna Marcum Associate Librarian for Library Services Library of Congress

This is the 3rd public session. Comments can still be sent to the committee or via the web site until the end of July.

The question is turning out to cover more than bibliographic control. Instead the broader question is: what is librarianship about in the web world?

When MARC was introduced, libraries were concerned that using MARC would have implications for their own local cataloging, and weren't sure they wanted to use this standard for their own local cataloging. Conforming to the standard meant giving up local practice. But we have gotten many benefits.

In the web world, users have the opportunity to use their own language for searching, and they are being successful. So what contributions can users make, and what will make things more effective for our users?

The theme today is economics and organization. Many librarians believe that cataloging should not be an economic issue. In "this" world, it is not possible for us to ignore the economic implications of cataloging.

The Library of Congress provides cataloging as a service, and that helps other libraries economically. But Library of Congress has no budget line for that service.

Speaker: José-Marie Griffiths, Chair, Working Group, University of North Carolina at Chapel Hill

This is the third of three meetings, each with a different theme.
1. Who uses bibliographic data produced by libraries, and what are the needs of users?
The meeting showed that there is a wide variety of users and uses.
2. Standards and structures
One issue that came out at that meeting is whether the process serves the needs of the community.
3. Economics and Organization
One study that the speaker has conducted was to determine the actual costs of "free" services.

Speaker: Judith Nadler, Working Group Member, University of Chicago Library
Judy described the meetings as being about Who, What, and How. We are now at the How.

Setting the Stage
Rick Lugg, Partner, R2 Consulting

He used to always say that there is no such thing as a bibliographic emergency. However, in the past few years he has found himself working as bibliographic trauma specialist. As consultants, R2 gets called in to see things that aren't working. In the cataloging area, he has seen huge backlogs that are so well-established they have sophisticated inventory systems. With hard copy backlogs you can go into a storage room and see the huge amount of material there. In the digital world you can't see the backlogs. Broken links aren't visible. You don't know what isn't getting done. We don't have a measure of how far behind we are in the digital world.

He said that the cost of bibliographic control is disproportionate to benefits.[kc: It would be great to have a way to measure that, or at least to measure what parts of the bibliographic record produce the greatest benefits.]

The MARC record for a basic monograph is a commodity. It is estimated that the creation of the MARC record is $150-$200. The book is cataloged once, and the cataloging is used many times. Libraries have contained costs by using different levels of staff for copy cataloging. But there are still a lot of duplicative costs in the system.

We have a cult of perfection with the following beliefs:
1 – bibliographic perfection is attainable
2 – cataloging is still about the arrangement of print books on the shelf

Bibliographic Perfection
One of the main barriers to cost savings is the desire to create the perfect record: people change bibliographic records, or at least check all of the details. They change call numbers and use custom Cuttering schemes. Many still write the call number in pencil on the verso of the title page. Some check the reported size with rulers. We focus on the record itself rather than what record is for.
We have a narrow view of quality – we see quality as being about the record, but not about timeliness. (Thus, the backlogs.)
What is good enough? The question should be: does this error impede access?
We need to take advantage work on elsewhere in supply chain.

Shelf arrangement still influences cataloging, but many items are in storage where shelf order doesn't matter. We still create unique call numbers, but duplicate call numbers don't prevent access. We need to think about browsing online, not just on the shelf.

We also need to consider the total cost of bibliographic control. There are the initial costs, but we also need to consider full lifecycle cost. Records are changed at various points, for example as we move items offsite, or move a book out of reference. Most of these changes are done manually. In serials, as we move from print to electronic and end or modify print subscriptions, records have to be updated. Much of this is inventory control, but still means record changes.

There are opportunity costs: What are we not doing that we should be doing? Answer: special collections cataloging, cataloging unique materials, and rare books, manuscripts and archives.

Another opportunity cost: we have no capacity for non-MARC metadata – no one has time to learn MODS, METS, DC. Cost in delay in moving in new directions.
We are involved in mass digitization, but we haven't started working on discovery of full text.
Catalogers are not involved in systems development early on, which affects how systems are developed.
How can we collaborate with others (not just other libraries) to create a richer bibliographic record?

Q: I asked: To what extent is complexity of MARC an issue? His answer was rather vague, so I think he hadn't really thought about this in detail. It would be interesting to know how much time is spent on things like fixed fields, or figuring out subfielding. It would also be interesting to do more experimentation with interfaces. Later speakers brought up the idea of using systems better to help catalogers work faster.

Speaker: Lizanne Payne - Library Consortium
Executive Director Washington Research Library Consortium

Lizanne Payne talked out how consortia can affect costs. Their main role is often providing joint licensing of digital materials, but they are also involved in ERMs and ILL workflow. The usually share a common OPAC to facilitate borrowing, and sometimes have a common ILS. This latter allows them to share the cost of IT staff for systems by centralizing systems. If don't share an ILS, then you have duplication between local catalogs and union catalog. You need 3 levels of bibliographic control: 1 – master record 2 – individual library records (eg for special subject control) 3 – holdings, shelving, etc.

Where libraries share a storage facility, searching for duplicates before sending to storage is very expensive.

[This talk brought up some interesting thoughts about duplication – of materials and of catalog records. Duplication keeps coming up for me in various projects I am working on, and it seems to have cost implications at a lot of levels, especially those areas where duplication in the user view is not desirable, but duplication that exists in the real world also serves users where access is concerned.]

I also learned from Payne's talk that MFHD is pronounced "muffhead."

Speaker: Mary Catherine Little - Public Library
Director, Technical Services Department Queens Borough Public Library

Little gave some good arguments for matching your cataloging to your actual need. She manages a huge and active public library with 65 different languages represented in the collection. She doesn't have the ability to produce cataloging in all of those languages so she relies on vendor-supplied copy and doesn't augment it. Her bottom line is to know what the library owns and give users access to it. She asks herself: am I creating data I'm not likely to use? Am I creating enough data for the ILS to function today? Tomorrow?

And, would this item be replaced if lost? (Many of her books are popular reading that are used for a few years then discarded when the item is worn out.) She even has some un-cataloged collections that are accessed at the shelf only. But fewer users today are in the library. [Note: there were various mentions that digital materials require more and better metadata, but no one really connected this to that fact that our collections are increasingly digital.]
She called for more sharing of vendor data – which of course means a change on the part of vendors.

Speaker: Susan Fifer Canby - Special Library
Vice-President, Library and Information Services National Geographic Society

The Special Library case was quite different from either public or academic libraries.
Some special libraries hold proprietary data that cannot be shared. They are focused on service to their organizations and often have considerable collections of archival and organizational records. They may have responsibility for all or part of the organization's web site. They may also use their collection for e-commerce, as is the case with the National Geographic Society's photo archives.

On the other hand, an organization can require that internal data providers attach certain metadata (like subject headings) to items they store.

The special library is not seen as a general good by the organization. It is a cost center, therefore has to produce value. Bibliographic control is not a major activity for them.

Questions and comments
Q: There seems to be a distinction being made between bibliographic control v. inventory control
Lugg: That starts way back in the chain. For vendors it's about inventory and sales. In systems, the overhead of using MARC as a transaction vehicle is too much, so the transaction areas of systems tend to keep less data and match it up to MARC when needed. However, libraries often see transaction data as part of MARC record (because they display together.). There are different needs within the system, and the MARC record shouldn't change when items circulate.
Q: The committee has done some thinking about atomizing MARC record, removing some complexity and creating different structures for the different functions
Payne: MARC was designed for transmittal, not for daily use. And there's no standardization for how it is broken apart and used in our systems, which makes system upgrades difficult. There are lots of areas of our systems that we haven't standardized.
Lugg: This really shows up in the holdings area. Libraries make different choices as to how that is structured and stored and displayed. Some of this is showing up as libraries try to go to Worldcat local.
Lorcan Dempsey (OCLC): The problem is not MARC, but the fact that we want to do more sharing, so all of these local options are showing up more as problems. It isn't the technology but the social way that we decide what goes into records (often designed for a single application but now want to reuse it for a different application.) Think of data as something that applications use rather than people.
Q: There are greater expectations for the sophistication of access. How much of that is part of shared bibliographic control and how much is local?
Little: Social tagging can represent the cultural aspects of language – the social spin on things.

The Stakeholders' Perspective
Speaker: Bob Nardini - The Vendor
Group Director, Client Integration and Head Bibliographer, Coutts Information Services

It is good that vendors are included in discussions of bibliographic control. Vendors produce a lot of bibliographic information. Coutts employs catalogers and is providing 280,000 bibliographic records this year. Other vendors are even larger. 63% of libraries obtain records from book vendors (based on a survey).

He spoke of the CIP program as one where vendors contribute data. Publishers produce metadata for their audience, for example publishers are very aware of the metadata needs of Amazon, since that translates to sales. He said that he would like to see more of a use of the metadata record in a marketing role. (I'm not sure what that means for libraries.)

Speaker: Mechael Charbonneau - PCC and Large Research Library
Director of Technical Services and Head, Cataloging Division Indiana University, Bloomington

Cataloging is seen as high cost activity, thus the Program for Cooperative Cataloging is a way to save labor. PCC is an international coalition coordinated by the Library of Congress, and a major stakeholder in the bibliographic data future. It relies on voluntary cooperation between libraries. Today, about 35-45% of shared records are being produced outside of the Library of Congress.
She mentioned a need to include non-MARC metadata (but didn't say which ones). She also talked about the need to internationalize authority files, and mentioned the Virtual International Authority File project at OCLC.

Speaker: Linda Beebe - Abstracting and Indexing Services
Senior Director of PsycINFO, American Psychological Association

A&I services create metadata for discovery. There is little emphasis on description in the library cataloging sense. There are particular needs in the different subject areas.

She suggested that we need to look at the "meeting points" of linked systems to see if there is a way we can simplify workflow. [She didn't give any detail, but I have thought that we need to define what our linking elements will be so that we can concentrate on those, and maybe skip non-linking data in some instances.]

One of the problems they are running into is the increase in supplemental audio visual files that need to be linked to the print resource.

She talked about the difference between customers and librarians. Librarians like controlled vocabulary, but users simply want to search on the terms they know. This means that systems need to handle lots of synonyms. We have to discard the notion that it takes special knowledge to find things in the literature. This isn't dumbing down, but making our systems work harder.

Questions and Comments
Q: In the past, vendors have been reluctant to allow their records to be merged with other vendor records because they lost branding. Is this still an issue?
Beebe: This is becoming less of an issue.
Q: What about the different treatments of author names?
Beebe: Searching for author is the most complicated thing. There are author profiles that some are putting together to help this. Social tagging might also help here.
Todd Carpenter (NISO): There is an ISO group is working on an international standard name identifier. This is being driven by the publishing community because of their interest in tracking royalties.
A: Crossref is working at author identifiers (also looking at institutional identifiers)
Q: Vendors don't use LCSH. Vendors put in more marketing tags and readership levels, plus formats (e.g. textbooks). Maybe this is something that Library of Congress should stop putting in records, but should take from the vendor records.

Speaker: Karen Calhoun - OCLC
Vice-President, OCLC WorldCat and Metadata Services

Response to the Background Paper

She was speaking from the view of OCLC as a stakeholder.
There are 7 economic challenges: productivity, redundancy, value, scale, budgets, demography, collaboration
1 - productivity
Fred Kilgore created a dramatic enhancement in productivity of cataloging
2. redundancy
OCLC shared cataloging removed duplication of effort; the Internet and web make possible other efficiencies
We talk about quality, but we all mean different things depending on our point of view. To the bibliographic control expert it means: adherence to rules. To the library decision-maker, quality has to do with stewardship of library funds and budgets, producing value for communities.
4. scale
Users look at and beyond individual library collections when seeking answers to questions. We must not narrow our scope to what we have done in the past.
5. budgets
Budget restrictions not surprising – especially as libraries move into new areas but have the same budgets.
6. demography
The famed "retirement wave" for generation of bibliographic experts begins in 2010. We will have to change hiring practices.
7. collaboration
These challenges won't be met by libraries working alone.

She then outlined some future potential for OCLC to respond to these challenges.
Metadata is like money – it is a medium of exchange; it points to the value of things.
OCLC might build grid services along the supply chain for creation and augmentation of metadata. The publication supply chain could be an interdependent flow of reusable metadata on the grid.

Where does metadata come from? From bibliographic control experts; publishers, authors, reviewers, readers, selectors. Where could metadata come from? Worldcat is a large unexplored resource, as evidenced by its terminology services and Worldcat Identities. OCLC could run a contract cataloging service. OCLC might help libraries by incrementally moving selected technical services functions to the network. E.g. build on the ILL fee management service into the acquisitions area, creating a kind of Pay Pal for libraries. This could make libraries less dependent on local systems.

Speaker: Beacher Wiggins - Library of Congress
Director, Acquisitions and Bibliographic Access Library of Congress

The Library of Congress has explored the use of bibliographic data from number of sources. PCC is the largest and most successful of these operations. They have increased their cataloging output at the same time that their staffing in the cataloging area has been cut. In the current congressional climate, they must do more without any increased funding for staff.

He mentioned the precipitating event of the Library of Congress dropping authority control for series entries as an example of how they have to cut back. They are re-organizing their cataloging staff such that technicians will do all descriptive cataloging and librarians will do authorities and subject analysis. They are shifting costs, not reducing costs.

He also mentioned the problem of not being able to share vendor data. Apparently there was a rather nasty incident between Library of Congress and Casalini Libri over the reuse of Casalini's supplied bibliographic records. (Something that no one talked about was the systems issues: identifying which records you cannot share. That itself must have some overhead.)

Questions and Comments
Q: There are many small libraries that cannot afford to be in OCLC. How can they be included if OCLC expands its services?
Calhoun: We're looking into that.
Q: What is the cost of leadership, such as standards maintenance?
Q: What is being done to increase training/continuing education?
Wiggins: ALCTS, Library of Congress and ALA are organizing continuing education in this area.

Speaker: Diane McCutcheon for NLM
Ideas on how to improve cost-effectiveness

NLM does both cataloging and A&I indexing.

She agrees that cataloging is a public good, but that service has costs. Institution has a particular obligation to create cataloging in cost-effective way. How? Fully utilize descriptive metadata that is available electronically, mainly from publishers and vendors. Basic descriptive data. Eliminate rote keying tasks. NLM uses metadata from journal publishers rather than re-keying – have realized cost savings. Publishers supply date in a standard format because they want to be in Medline. Need to convince publishers that it is to their advantage to be cited in catalogs.
Getting metadata earlier in the chain. Can't use MARC – need to use an xml format (ONIX) – but library systems can't handle non-MARC data. Use crosswalks instead. There is a need for those crosswalks to be available to others.

Making more use of automated data. Current cataloging is like hand crafting furniture or clothing. Need to move into mass production. Some materials may not have electronic data, but we should take advantage for those that do. Need to make more use of more machine assistance. Catalogers are often working in subject areas where they aren't expert, so machines can help with subject heading and classification assignment. They've been working with an automated system that suggests MeSH terms.

New economic model : libraries create data, then share for the cost of sharing it via OCLC. Libraries and vendors have little incentive to do original cataloging.

Need faster standards development. Can't take 2-5 years.

Speaker: Chris Cole for the National Agricultural Library

NAL also does both library and A&I publisher. Indexing uses basic metadata supplied by the publisher. This saves a considerable amount of cost. No suffering of quality. Use of publisher data both possible and necessary. Metadata should be created from data supplied by publishers, with libraries adding value.

NAL contributes to CIP on agriculture related titles. Many libraries use the CIP record because they aren't connected to the network.

Current process isn't economically feasible. We can also get data from music and sound recording industry. If we can move from transcription to adding values, we can tap those resources. This is especially true for digital files, which cannot be discovered without metadata.

Focus of RDA is on traditional materials and traditional procedures, unfortunately. RDA is not recommending an abandonment of standards but a transformation. Do not focus on the record but on clear set of data elements that can be used by libraries, vendors and others that can be reassembled as needed for different uses.

Lorcan Dempsey: the majority of records in Worldcat are not from Library of Congress, but the majority of holdings are on Library of Congress-produced records.

Q: cost and value of creation of thesauri and classification
NAL: we have a thesaurus, and have found that others want to use it and offer to help (in the sciences)
NLM: authors should identify themselves; publishers aren't in our discussions about and they need to be here.

We put too little value on our work. It costs $130 to catalog book but we sell it for 6 cents. What do we offer to people to make it worth their while to contribute?

Regina Reynolds (LC): one economic model is bartering. Could we barter our data in trade for expertise?

Dan Chudnov: [after some in audience rejected the idea of "non-expert" social tagging] The user, in social tagging becomes an access point. Hard to reconcile with privacy, but somehow we have to do that. LibraryThing has social tagging around Library of Congress data. Also, we need an involvement of technology folks in the discussion about bibliographic control.

Speaker from U Penn on subject analysis: issue is: how to make it more efficient. Not the creation of the string, but the aboutness of the work. There is no way to contribute actual subject headings (in cooperative cataloging) in the same way as name authorities files. Social tagging: "expert" tagging defeats the purpose. There's a value in letting users decide; people tag for various reasons, have points of view.

Wiggins: We are looking at the pre-coordination of Library of Congress subject headings. Will issue a report looking at simplification.

Speaker from Folger Shakespeare library: How do we know when we have accomplished our goals? What are our evaluation mechanisms?

Speaker from Library of Congress, education office: Most speakers seem to say that a librarian manages, a user finds. The problem is that we don't use our own products.

Library of Congress staff member: was viewing the meeting on the webcast and came up to say something. We keep talking about incrementally change how we process bibliographic records so we can create more of them. Library of Congress and OCLC are metadata repositories. We should think more radically about what kind of metadata repository we want and need. Create a repository for all of the ONIX data that publishers are creating that allows a way to use that data. Let libraries download the information they need. Do this rather than item-by-item work flow.

Karen Calhoun: OCLC is exploring a way to use ONIX data, enrich it and send it back to publishers, and then create MARC records, and let users add enrichment.

Summary of the Day
Speaker: Robert Wolven, Working Group Member, Columbia University Library

Themes of the day:
  • We have focused on today and the near-term future.
  • We've thought mainly about trade monographs and efficiencies there.
  • But opportunity costs are about less standard areas of collection. Economics there are more local and individual; less opportunity with collaboration. Are we looking at economic shift to areas where we won't have large scale economies?
  • We look carefully at individual records we are creating, then we go off and load hundreds of thousands of records in sets.
  • Different approach to names authorities in cataloging and A&I databases.
  • We need to think about lifecycle of resource. We tend to think about initial process, not later changes. Some life-cycles are short, like public reading, others longer term, like making decisions about off-site storage.
  • The MARC record is a commodity; we need appropriate distributions of costs. How to compensate vendors for cataloging; vs "free riders" in the chain. Do we recoup the costs from those who benefit? Or do some bear the costs?
  • We don't want to pay for metadata we get but talk about getting value for our metadata. This implies retaining control.
  • We propose that value is tied to its use, yet a lot of our effort goes into metadata that isn't used much. Do we focus on area where we have the most sharing, or on the long tail? With long tail less ability to share costs.
Education as an economic factor: how we education and re-educate staff. But we also expect our users to learn. Education as a barrier.

Digital backlogs – we don't have ways to understand what they are and how they are treated. We don't have measures of this.

Final Thoughts
Speaker: Deanna Marcum

What will they say a hundred years from now talking about the choices we had in 2007? The choices we make at the Library of Congress will make a difference. Library of Congress has focused on cataloging those materials that will be most used by other institutions. Of 130 million items at Library of Congress, only 30 million have records in the catalog. Many are set up as mediated collections. Many are unique or rare materials. Not sharable like books and journals. Users now expect to get access to these materials.

Library of Congress is going to identify performance measures that are quantitative (as much as possible). We have to report back to Congress on benefits and who has benefited. This is much more detailed of a report than Library of Congress has ever had to do before.

What do we all have in common? That we are the institutions in which society has placed its trust that we will figure out: what should be saved, how will it be saved, how will we make it available over time.


Anonymous said...

Re my first point ... MARC may very well be an issue. I think though that people focus on MARC as if by changing or doing away with MARC a lot of problems will be solved. Some things may be improved; but others will not.

Anonymous said...

Thank you, Karen, for putting this info up.

Right, Lorcan. Yes, rejuvenating our data structure would help (with no small degree of pain), but the fundamental question of how much/how accurate/how effective metadata to create will not be solved by replacing MARC. And processing whacking great quantities of XML is not turning out to be a picnic, either.

I agree that as a community we need to make better use of metadata from "upstream". However, I do wonder about the ultimate costs. If the library community becomes reliant on vendors, publishers, and others to supply the information underlying our core services, we place ourselves at their mercy in terms of prices, quality, and restrictions on redistribution. If each library pays those costs and keeps the records to themselves, where is our cooperative then? Or does OCLC serve as our broker among record providers?

Jonathan Rochkind said...

Great summary, valueable service to have this in text on the web, thanks a lot.

Very interesting discussions. At all three meetings, I am impressed by the generally sophisticated level of discussion (by which I mean people are talking about what I think they should be, naturally :) ). It will be interesting to see what the final report looks like--and how it can be made into an action plan.