Tuesday, April 01, 2008

Copyright and MARC

OCLC (well, OCLC/RLG) has released a summary report on copyright investigation in academic institutions. The basic results are obvious, but it's good that someone has done even a small amount of fact-gathering on this topic. The report concludes that determining copyright status of library materials is hard work. That's no surprise. Two statements in the report, however, give me special satisfaction:

Institutions are working with MARC data from their local catalog. This information allows them to rule out high risk materials, and work with a narrower pool of items. Beyond this, institutions need to examine items for information not contained in MARC record. (p.3)

Information is not present in MARC records both because local MARC record sometimes doesn't contain information it should, and also because cataloging rules don't call for including some information that is necessary for this purpose. (p. 4)
I want to point out that a field for copyright information has recently been approved for addition to the MARC21 record, and that I authored the proposal that went to the MARC standards committee. There were many people who opposed the proposal on the grounds that catalogers should not be determining the copyright status of items. I continue to point out that the field (based on the copyrightMD schema developed for the California Digital Library) does not determine copyright status. It provides certain factual information, such as recording the copyright statement that is on the work, something that I honestly think should be included in descriptive cataloging. It also provides a way to record the assessment made by the library or archive. Much of the time this assessment will be "unknown status," but the few times that it can be determined that an item is in the public domain, that information is highly valuable to the potential users of the material.

CopyrightMD was mainly motivated by the needs of archives undergoing digitization projects. I was struck by the fact that libraries and archives are making materials available online and then either 1) saying nothing about the copyright status of the work, or 2) making statements about use permissions that have no basis in law. For a while I was gathering up the more egregious statements by institutions about rights in materials that they had digitized and made available for open access online. Basically, libraries and archives are often making statements that amount to actual permissions, such as "You can make up to 2 copies of this for personal use." The problem with that statement is that only the copyright holder can give such permissions; that's what copyright is all about. It is also fairly simple to find cases where copyright is being claimed on works that are in the public domain; you can find Federal documents in Google Books that are only available in snippet view or that have "Copyrighted material" watermarked on each page. You and I may be able to scoff at this absurdity, but shouldn't we be doing better by our users? Shouldn't we be providing them with information to counter this mis-information?

The information that we provide should include everything that we do know about the basic facts that are used to determine copyright:
  • who is the creator?
  • who is the copyright holder?
  • what copyright statement is on the work?
  • when was this created/published?
  • if a copyright holder is known, what is the appropriate contact information?
  • etc. (see the documentation for all the fields)
Some of this is part of library cataloging, but certain key information (like a copyright holder statement) is not included. There also is renewed attention to data points like creator death dates, which feature in copyright determination but were not being provided by the US cataloging community until recently. Mainly, however, most archival materials are not being given full cataloging. Brief metadata records are being created as items are digitized. My argument is that for any item that is digitized and made accessible, the data that informs copyright status is absolutely essential in that metadata.

It is equally important state what you do not know along with what you do know about the item. If the archive has no idea who created the work, or has a name but not enough information to determine a death date, then it would be good to tell the user that up front. Often, archives include a statement like "Contact XYZ archive for more information." When I asked them what usually happened when a person did contact them, they said that most of the time they had no further information for the user. And they knew that at the time that they digitized the item and created the metadata for it. Can you see the problem here?

The OCLC/RLG report shows that there are two primary situations where librarians are spending time making copyright determinations: 1) when deciding if they can provide access to materials, which means digital access (since hardcopy access doesn't have copyright issues) and 2) when the institution needs to engage in re-use of materials, mainly in archives and special collections. Each time that one of these assessments is done, the data that is gathered to support the assessment should be recorded so this work does not need to be repeated at another time. It seems obvious to me that part of the assessment process should be the recording of the data that isn't in the catalog record today. I'm still stunned that we haven't begun doing that on a regular basis.


Merrilee Proffitt said...

Karen, thanks for the review of the RLG Programs report, and for the additional comments. The addition of the MARC code is certainly useful, but it will be a while before it will be implemented in local systems (no?).

For those of us who work in with archival collections, the MARC record would give the collection- and perhaps series-level information. In the case of most manuscript collections, the creators of items within the collection will be quite diverse, ranging into the hundreds of creators. It is quite typical for individual creators to be unidentifiable, for the date of creation to be unclear, etc. Given the practice of treating a collection as a collection (and not as a bunch of items), this information on items is not available at the get-go of a digitization project.

I think you're right to suggest that we should be giving as much information to end users as possible, even if that is "we don't know, and here are reasonable steps you can take to explore more." The SAA Intellectual Property Working Group is working up such guidelines now.

Karen Coyle said...

Merrilee - thanks for the pointer to the SAA group. At the CDL, we were really looking at this as a user service issue, which seems to be SAA's direction. I always felt like we were dangling tantalizing stuff in front on users and then not giving them a clue as to what they could do with it, nor how to find out.

And, yes, the MARC field will take a while to be implemented. But if you are creating metadata in XML, the copyrightMD schema is available for use right now. I'd be very interested to hear how it could be made usable at the collection level. It may need some new fields that address issues like "many creators". It does address "unknown" and "date estimated" kinds of issues, but there isn't a way to specify that those apply to a collection rather than the item the user is looking at.