There's nothing like inter-continental travel to provide you with those sleepless nights that are ideal for a re-reading of the FRBR document. I am not a cataloger, so I assume that I am missing some or much of the importance of the FRBR analysis in relation to how catalogers view their activity. My reading of FRBR is that it is a rather unrevolutionary macro analysis of what cataloging already is. As a theoretical framework, it gives the cataloging community a new way to talk about what they do and why they do it. From what I've understood of the RDA work, FRBR has brought clarity to that discussion, and that's a Good Thing.
Then I hear about people FRBR-izing their catalogs, and I have to say that I can find nothing in the FRBR analysis that would support or encourage that activity. FRBR is not about catalogs, it's not even about creating cataloging records, and it definitely does not advocate the clustering of works for user displays. I'm not sure where FRBR-izing came from, but it definitely didn't come from FRBR. FRBR defines something called the Work, but does not tell you what to do with it. In addition, the Work is not a new idea (see section 25.2 of AACR2 where it describes the use of Uniform Titles).
I think that those of us in the systems design arena have confused FRBR, or perhaps co-opted it, to solve two pressing problems of our own: 1) the need to provide a better user interface to the minority of prolific works, that is, the Shakespeare's and the oft-translated works; 2) and the need to manage works that appear in many physical formats, such as a printed journal and the microform copy of that journal, or an article that is available in both HTML and PDF. We can find elements of FRBR that help us communicate about these issues; we can talk about Works (in the FRBR sense) and Manifestations. But solving these problems is not a FRBR-ization of the catalog.
The first problem, that of prolific works, had at least a partial a solution in the card catalog: the Uniform Title. It was that title that brought together all of the Hamlets, or all of the tranlsations of Mann's "Zauberberg." While RDA may in the future define work somewhat differently from AACR2 and may expand the breadth of the groupings of bibliographic records, this isn't a new concept. Interestingly, I find that Uniform Titles are often not assigned in catalog records, which limits their usefulness. So here we are hailing FRBR when we aren't making use of a mechanism (UTs) we already have. In any case, we are finally trying to cluster works in a way that should have already been part of our online catalog. Although the definition of Work may have changed, the idea of grouping by work is not new.
The next problem is one I hoped would be addressed in RDA but it appears that it isn't (well, I can't find it in the drafts): should we catalog different physical formats as separate items, or could we have a hierarchical view of our catalog entry that would allow different physical formats to be listed as a single item? Physical formats are important because the format can determine the user's ability to make use of the item. This is conceptually a cataloging question, but it's also a systems design issue, which is one of the reasons why I would like to see some work between the RDA committees and a group of systems designers. From this latter point of view, my preference would be to create a multi-level record that allows for manifestation and copy-level information to be carried with (or linked to) the bibliographic data. The MARC21 Holdings Format gives us one model for a solution, but in my experience it needs a make-over (and another level of hierarchy) in order to play this role.
I'd like to see discussion of both of these issues and their possible solutions. It is clear to me that post-processing of current catalog records is not sufficient to create the kind of user services that we want to provide. We are going to have to talk about what we want our data to look like in order to serve users of our catalogs.
Wednesday, October 25, 2006
Tuesday, October 24, 2006
Google Book Search is NOT a Library Backup
I have seen various quotes from library managers that the Google Book Search program, which is digitizing books from about a dozen large research libraries, now provides a backup to the library itself. This is simply not the case. Google is, or at least began as, a keyword search capability for books, not a preservation project. This means that "good enough" is good enough for users to discover a book by the keywords. A few key facts about GBS:
1) it uses uncorrected OCR. This means that there are many errors that remain in the extracted text. A glaring example is that all hyphenated words that break across a line are treated as separate words, e.g. re-create is in the text as "re" and "create". And the OCR has particular trouble with title pages and tables of contents:
Copyright, 18w,
B@ DODD, MEAD AND COMPANY,
411 r@h @umieS
@n(Wr@ft@ @rr@
5 OHN WILSON AND SON, CAMBRIDGE, U. S. A.
Here's the table of contents page:
(@t'
@ 1@ -r: @
@Je@ @3(
CONTENTS
CHAPTER PAGS
I. MATERIAL AND METHOD . . 7
II. TIME AND PLACE 20
III. MEDITATION AND IMAGINATION 34
IV. THE FIRST DELIGHT . . . 51
V. THE FEELING FOR LITERATURE 63
VI. THE BOOKS OF LIFE . . . 74
Vii. FROM THE BOOK TO THE READER 8@
VIII. BY WAY OF ILLUSTRATION . 95
IX. PERSONALITY 109
X. LIBERATION THROUGH IDEAS . 121
XI. THE LOGIC OF FREE LIFE. . 132
XII. THE IMAGINATION 143
XIII. BREADTH OF LIFE 154
XIV. RACIAL EXPRESSION . . . i65
XV. FRESHNESS OF FEELING. . . 174
2) it will not digitize all items from the libraries. Some will be considered too delicate for the scanning process, others will present problems because of size or layout. It isn't clear how they will deal with items that are off the shelf when that shelf is being digitized.
3) quality control is generally low. I have heard that some of the libraries are trying to work with Google on this, but the effort by the library to QC each dgitized book would be extremely costly. People have reported blurred or missing pages, but my favorite is:
"Venice in Sweden"
Search isbn:030681286X (Stones of Venice, by Ruskin)
Click on the link and you see a page of Stones of Venice. Click on the Table of Contents and you're at page two or so of a guidebook on Sweden. Click forward and backward and move seamlessly from Venice to Sweden and back again. Two! Two! Two books in one! (I reported this to G months ago.)
4) the downloaded books aren't always identical to the book available online (which in turn may be different to the actual physical book due to scanning abnormalities). Look at this version of "Old Friends" both online and after downloading, and you'll see that most of the plates are missing from the downloaded version. Not necessarily a back-up problem, but it doesn't instill confidence that copies made from their originals will be complete.
Note that these examples may not affect the usefulness of the search function provided by Google, but they do affect the assumption that these books back up the library
1) it uses uncorrected OCR. This means that there are many errors that remain in the extracted text. A glaring example is that all hyphenated words that break across a line are treated as separate words, e.g. re-create is in the text as "re" and "create". And the OCR has particular trouble with title pages and tables of contents:
Copyright, 18w,
B@ DODD, MEAD AND COMPANY,
411 r@h @umieS
@n(Wr@ft@ @rr@
5 OHN WILSON AND SON, CAMBRIDGE, U. S. A.
Here's the table of contents page:
(@t'
@ 1@ -r: @
@Je@ @3(
CONTENTS
CHAPTER PAGS
I. MATERIAL AND METHOD . . 7
II. TIME AND PLACE 20
III. MEDITATION AND IMAGINATION 34
IV. THE FIRST DELIGHT . . . 51
V. THE FEELING FOR LITERATURE 63
VI. THE BOOKS OF LIFE . . . 74
Vii. FROM THE BOOK TO THE READER 8@
VIII. BY WAY OF ILLUSTRATION . 95
IX. PERSONALITY 109
X. LIBERATION THROUGH IDEAS . 121
XI. THE LOGIC OF FREE LIFE. . 132
XII. THE IMAGINATION 143
XIII. BREADTH OF LIFE 154
XIV. RACIAL EXPRESSION . . . i65
XV. FRESHNESS OF FEELING. . . 174
2) it will not digitize all items from the libraries. Some will be considered too delicate for the scanning process, others will present problems because of size or layout. It isn't clear how they will deal with items that are off the shelf when that shelf is being digitized.
3) quality control is generally low. I have heard that some of the libraries are trying to work with Google on this, but the effort by the library to QC each dgitized book would be extremely costly. People have reported blurred or missing pages, but my favorite is:
"Venice in Sweden"
Search isbn:030681286X (Stones of Venice, by Ruskin)
Click on the link and you see a page of Stones of Venice. Click on the Table of Contents and you're at page two or so of a guidebook on Sweden. Click forward and backward and move seamlessly from Venice to Sweden and back again. Two! Two! Two books in one! (I reported this to G months ago.)
4) the downloaded books aren't always identical to the book available online (which in turn may be different to the actual physical book due to scanning abnormalities). Look at this version of "Old Friends" both online and after downloading, and you'll see that most of the plates are missing from the downloaded version. Not necessarily a back-up problem, but it doesn't instill confidence that copies made from their originals will be complete.
Note that these examples may not affect the usefulness of the search function provided by Google, but they do affect the assumption that these books back up the library
Monday, October 23, 2006
Internet Filters and Strange Bedfellows
In the legal battle against the Children's Internet Protection Act (CIPA), the government's position was to mandate filters on library computers as a way to protect children. The ALA and the ACLU argued that such filters were unconstitutional as they blocked speech protected by the first amendment, but also that the filters were ineffective to the purpose intended, letting some "inappropriate" material through. Judith Krug testified, saying: "Even the filtering manufacturers admit it is impossible to block all undesirable material." The government, of course, argued for filters.
Now, the Child Online Protection Act (COPA) from 1998 is going to court. This law requires that Internet sites that carry material that may be harmful to children use some method (such as requiring a credit card number) to prevent children from accessing the material, or face criminal charges if children access their site. In this case, the ACLU is expected to argue that filters are a better way to prevent children from see the offending material (having evolved since CIPA days), and the government will argue that filters are ineffective because a fair amount of pornography slips through them.
*sigh* It's the absurdity of it all that gets to me. That and my paranoid fear that it's all a plot to engage our limited resources while our rights erode on so many fronts.
Now, the Child Online Protection Act (COPA) from 1998 is going to court. This law requires that Internet sites that carry material that may be harmful to children use some method (such as requiring a credit card number) to prevent children from accessing the material, or face criminal charges if children access their site. In this case, the ACLU is expected to argue that filters are a better way to prevent children from see the offending material (having evolved since CIPA days), and the government will argue that filters are ineffective because a fair amount of pornography slips through them.
*sigh* It's the absurdity of it all that gets to me. That and my paranoid fear that it's all a plot to engage our limited resources while our rights erode on so many fronts.
Thursday, October 05, 2006
Hiatus
For the next two weeks I will be in Venice, Italy, where I intend to contemplate the thing called the "book" and other wonders of the world. I am readying some posts that I may not be able to complete before leaving, one on DRM in particular.
In the meanwhile, if you are interested in the future of the library catalog and the related future of the MARC record, I invite you to add your thoughts, in draft form, to the futurelib wiki. The password is dewey76. You can add new pages, or add information to the pages that are there. Many pages have sections labeled "cooked" and "raw." The raw sections are places where you can put any ideas, even if not well formed nor coordinated with other text there. Consider it a storage box for ideas we don't want to lose. There are also places for bibliographies, if anyone wants to fill those in. And if you work on a system that is innovative (note the small "i"), add it to the list of examples.
In the meanwhile, if you are interested in the future of the library catalog and the related future of the MARC record, I invite you to add your thoughts, in draft form, to the futurelib wiki. The password is dewey76. You can add new pages, or add information to the pages that are there. Many pages have sections labeled "cooked" and "raw." The raw sections are places where you can put any ideas, even if not well formed nor coordinated with other text there. Consider it a storage box for ideas we don't want to lose. There are also places for bibliographies, if anyone wants to fill those in. And if you work on a system that is innovative (note the small "i"), add it to the list of examples.