Coyle's InFormation: 01/01/2008

Friday, January 25, 2008

Books as Social Vectors

Ursula Le Guin has a fabulous article in Harper's (Feb. 2008, v. 316, n. 1893) responding to the NEA report on Reading at Risk. That report states that there has been a sharp decline in the reading of books of "literature" (which I couldn't find a definition for in the report).

Le Guin's article is called "Staying Awake," which comes from one person's statement "I just get sleepy when I read." As Le Guin points out, there are "people who read wide awake," but the corporate culture of today's publishing isn't interested in cultivating anything except the "best seller" product. (Some books are art "And the relationship of art to capitalism is, to put it mildly, vexed.")

She talks about what reading has meant to culture ("Books are social vectors..."), from the early use of books to spread a uniform view of religion, to the late 19th century serial books that had everyone discussing what would happen next. It is this aspect of books as social vectors that I think we in the library world need to come to grips with.

The public library of the 19th century was about bringing book culture to the masses. (See Dee Garrison's book Apostles of Culture for a good account.) Somewhere in the 20th century we swung the pendulum in the opposite direction and began aiming for maximum neutrality. But people don't respond well to neutrality. In fact, they are... well, neutral on it. It takes a certain interest, perhaps even passion, to stay awake.

There are obvious issues for libraries (many of them government agencies) should they become instigators of passion for books. However, I see a somewhat less problematic possibility, which is allowing the library to itself be a "social vector" by connecting the library, and in particular its catalog, to the world of social networking. This is starting to happen in a small way, such as links from web sites or social bookmarking tools to WorldCat, but I think it's time to really ratchet up our efforts in this area.

Friday, January 18, 2008

Being Careful

The Library of Congress has put some great collections of photos up on Flickr. Take a look at the group on the 1930's and 40's. There are some great photos of "Rosie the Riveter" women building various implements of war, especially aircraft. (Note: the photos do look staged, or at least they gave the women a chance to freshen their lipstick before the shoot.)

Type the word "careful" into the search box at the top and you'll get an idea of that era's nervousness about having women work on technology, and the qualities that women were seen as bringing to the job. (Hint: it's not innate mechanical ability.) These women really did show that "we can do it." Training, schmaining -- just give me a power tool and turn me loose!

Friday, January 11, 2008

ALCTS CCS Discussion of RDA Draft

Committee on Cataloging: Description and Access (also known as CC:DA)

This is an informal discussion on the draft of RDA. Focuses on sections 2 & 3; the remainder will be covered on Monday.

Overall Comments

Discussing:
Chapter 2. Identifying manifestations and items
Chapter 3. Describing carriers

This was a discussion of RDA by the ALCTS cataloging group Presumably the comments here become the ALA comments to the JSC.

It's hard to characterize this discussion. It varied between comments about the need to improve the definitions and the problems with the structure of the document, and with statements like: What we have here is a crisis of confidence; pedantic adherence to structural hierarchy; Why is it that the rules from AACR2 do not inspire confidence in this new setting?

Here are some of the more interesting issues that came up (filtered, obviously, through my viewpoint).

- Some of the text is just AACR2, pulled into the RDA document. This is considered by some to be a step backward -- that this "new" code doesn't take advantage of the opportunity to make changes in these areas.

- There seems to be a great deal of confusion on what the final RDA product will actually be. Some see it as the final cataloging code that they will use daily in their cataloging. Others (possibly the members of JSC) see RDA as being a basis for cataloging, but a neutral background for the creation of actual cataloging rules. This is particularly odd when we consider that the RDA text is being transferred to an online system which will be the primary product allowing people to access RDA.

- This also means that there is tension between creating a general code and getting all of the special rules in for music, law, cartographic materials, etc. This tension does not seem to be resolved, and there are people with different expectations.

- The who RDA direction seems to be in incredible flux. You may know that they recently announced a restructuring of the document to make it in more line with FRBR. They also seem to be attempting to do some redesign of concepts. For example, there is no longer any reference to authority records -- it is assumed that in the future, those will not exist as they do today, although the same information will be carried somewhere.This is a major change - at least in thinking, to happen just months before the full draft is due to be available.

- There was a fair amount of dissent over the format of the text and the fact that it will be thousands of pages in length when created. There's a deep contradiction in the process, because the RDA folks say that they are writing the online version, but they are creating this as a print document. As some members of the audience pointed out, they are really creating neither -- what they have doesn't work as a print document, and the web document will undoubtedly look quite different. So... what is it that we are looking at now?

- Although there is quite a bit of dissent in the US over RDA, there is great enthusiasm among the non-US members of the Joint Steering Committee. We don't have any explanation as to why we have these polar opposites, and it would be very interesting to hear WHY they think it's so good, sonce here it seem to be almost universally disliked.

- There was a fascinating, but not quite coherent, discussion of persons and personal names: are we identifying persons, or are we identifying names? If the same person uses more than one name, how many identities is that? It was said that we are now treating persons like corporate bodies: a difference in naming is a different entity. This has some practical elements, of course, but it also seems to be deeply philosophical and something that we have to be very clear on if we are going to exchange data with communities who emphasize persons over named identities.

ALA Friday "Big Heads"

Big heads meeting

(For those who don't know, this is the "heads of technical service departments of large libraries" meeting that takes place each year and is an update on all things tech services.)

RDA update - John Attig

1. Collaboration of RDA developers and dc to define element set and vocabularies. working group has been set up. We now have startup funding. There is a meetng tomorrow to formulate a startup plan

2. Gordon Dunsire is doing an element set for FRBR with IFLA support, using the NSDL vocabularies registry.

3. There is a vendor for development set (Cognolore?). JSC is also working with ALA publishing: Nannette Naught will be consultant for the standards software. The first component is authoring system. They will soon load current version of all of the text and will use this to maintain text.

4. Work is also beginning on RDA online product. It's not live yet. A new prototype will be created between now and annual, and we will be able to see it there.

Content development:

JSC met for a week in Chicago. There is now a new reorganization of RDA content. This has a better relationship and better alignment with FRBR, and is less constrained by present implementations.

There was before a primary division between bibliographic and authority work. Now it has all of the FRBR group elements together, including subjects (Group 3). Subjects will have a place in the document, but won't be filled in initially.

They are aiming at a Relational/Object-oriented scheme in which description of each entity exists as an object within the data structure and relationships are shown between them with URIs. this is not supported by our current systems or by MARC -- this is the future. There is less emphasis on controlling the form of textual entities and more on describing attributes.

Some drafts are still out for review and will have to be reconciled in this new structure. The committee is still working on specific comments on chapters that have been reviewed. They still need to do appendices and examples. They will meet for two weeks in April.

The draft of the general introduction will be discussed at the April meeting. They expect to have a complete draft in July for public review.

Implementation

National libraries associated with the JSC are committed to coordinated implementation of RDA. The Program for Cooperative Cataloging is looking at rule interpretations and any needs for implementation guidelines. An ALCTS task force has been formed to look at implementation of RDA and will report out at ALA Annual.

There is a report to MARBI this conference outlining an initial implementation in MARC21. There are signfiicant differences in the granualrity of data elementss and we need to figure out how much to align them.

We also need to begin work on a new carrier to suceed MARC.

WuGroFoBibCo

Bob Wolven (WG member)

It is hoped that this session will mainly be dedicated to discussion.

Some background: over last 6 weeks, after putting out the final draft for comment, the group got 150 pages of comments. Some contradict each other. Many resulted in revision of text to clarify meaning in the report.

Areas of comment: clarification of meaning of particular sections, which they did. There was also a desire to be more specific about the 'how'. That wasn't for group to do, and in any case there was not time for that kind of analysis. As an example, the call for a new carrier – this is not easy to do and this wasn't the right group. but hopefully more energy will be put to it.encouraging discussion, in groups like this.

People also wanted to know exactly WHO would do certain tasks. Many Cannot be delegated to a specific group; everyone needs to work on it. Also, some changes will not controlled by libraries, but involve more participants.

Chris Cole, NAL (working group member)

This is not just a report to LC; recommendations are to many groups and to the community. In terms of the new carrier: we have to recognize that the primary users of data are/will be machines, so standards development done by people for people is not necessarily suited to this environment. There is an engineering component. We need real solid testing; not just philosophical agreement on the standard, but will it work?

Also, the realization of economic dependencies. We all are strapped - and that includes LC. We need to make decisions based on this reality.

The process: this was an outcome of the serials control decision mess. LC made lemonade out of lemons. LC stepped back to take a bigger look. The working group represents wide range of interests. Chris is part of indexing industry; Bob represents PCC. There were also representatives from Google and Microsoft. Note that we reached more than consensus; it was more like unanimity -- there is not a minority report.

Questions:

Q: What does carrier mean?

A: It means that we cannot modify MARC to be our future bibliographic record; we have to create something substantially different.

Q: What were comments?

A: The recommendation on RDA got the most comments. There were many comments about the statement that LC is not the national library. Also a considerable number of comments on the economics. Another set of comments was on "we're already doing that."

(added by Karen Coyle: there was a web site that gathered signatures asking for library data to be open, started by non-librarians. This shows that there are people outside of the library world who are interested in using library data.)

Q: There is also a similar economics question for RDA: ALA needs revenue from this process to fund it.

A: We need someone who knows economics to take a good look at this. Also, economics of standards development and maintenance hasn't been worked out. This is complex, and in the end we need to find ways to reduce costs.

Chris: more players than libraries and LC: OCLC, publishers, vendors who sell metadata.

Comment: The proposed work with Dublin Core pulls out the structure of a possible carrier, so this won't be part of RDA economic model.

Comment (UCLA): We shouldn't spend time perfecting FRBR.

Comment (Yale): RE: special collections and manuscripts – This was a good section, and we would have liked to have seen it go farther. There are large hidden collections, some printed (pamphlets). This is a cultural issue. We want to see more on priorities for LC and for all of us. E.g. LC provides expertise on non-western materials.

LC (Beacher): The is an LC internal group on the nature of bibliographic control at LC – it will now take on this report, analyze it and comment.

- The Bibliographic access group (beacher's) will look at the report

- public services area will react.

The last two will come together to report to Deanna and the five directors who report to her. By ala annual they should have a plan of action to share. The Library is pledging that each recommendation will be addressed and will get a statement and reaction to explain why it is accepted or not accepted by LC and how it will be carried out. Some are immediate, some are already underway, some will be longer term. LC has not done a good job of sharing with the community what it has been doing, and there are many projects underway. They don't have a timetable today. Mid-spring is the target for the first LC response, and they will have something to discuss at Annual.

Tuesday, January 08, 2008

More on RDA and "literals"

I did spend time looking over the RDA Element Analysis for its use of "literal" and "non-literal." (I know that to some of you this will seem like nit-picking, but trust me that in the end it will matter.) Let me start by giving my definition of "literal" and "non-literal":

literal: an alphanumeric string representing a value.
example: "Moby Dick, or The whale"
example: "Herman Melville"
example: "ISBN:123456789x"

non-literal: a surrogate for the value itself.
example: uri:lccn: n 79006936 [identifies the LC name authorities record for Person Herman Melville]
example: http://authoritylists.info/uri/RDACarr/1052 [identifies RDA carrier type "volume"]

In programming, non-literals are those data elements that you give a name to. This means that in programming you are mainly working with non-literals:

mainTitle = 245ab

In data records, the values are often literals:

dc:title = "Moby Dick"

When we think about RDA and the literal/non-literal difference, we have some choices. RDA could treat every value as a literal, and let another standard, the data standard, define some as non-literals. This would provide for the maximum flexibility for implementing RDA.

Another possibility is that RDA could define some elements, such as the vocabulary lists included in RDA, as non-literals. These are, indeed, defined as non-literals in the RDA Element Analysis. This would mean that any use of RDA would need to define vocabularies for those data elements, and would have to assign those vocabularies and their entries with identifiers.

There are data elements that might be a literal in one implementation, and a non-literal in another. Author names are an example of this. Today we actually embed the author name as a literal in our MARC records; in the future we could use an identifier to link to an author record, as shown in the example above. Whether or not something is a literal is often a matter of implementation.

There are some data elements, or pieces of information that we think of as single data elements, that could be a combination of a literal and a non-literal. We already have this in the MARC record in the date element in the 008 field. The date itself is a literal ("1984"); it's just a string of numbers. Included with the date is a code that tells you what kind of date it is (single, range, copyright date). The same could be true of the extent statement: "345 p." could consist of a literal ("345") and a code for the unit (pages or leaves or volumes, etc.).

There are data elements that are hard to think of as a non-literal, and in fact they may never be one. Titles, explanatory notes, values like numbers of pages or dates -- all of these are likely to be simple text values in a data record.

Conclusion

RDA, as a set of cataloging rules, should not pre-determine whether elements are transcribed as literals or whether they are represented with surrogates for the values.

A step related to RDA in which RDA is defined as data elements that can be encoded for processing should allow literals for all data elements, but should be defined in such a way that non-literals could be used for any data element.

Another step, that encodes RDA as the library world's bibliographic record, should define non-literals for all vocabulary lists, and, where possible, for all units of measure or data element attributes (such as the type of publication date). It should also define optional non-literals for all authority-controlled elements. This would allow us to move increasingly in the direction of using non-literals.

Of course, our data elements themselves are (or should be) defined in such a way that they are identified with URIs, and therefore are non-literal values. This should be an obvious step in moving our data in the direction of the semantic web.

OK, I've stuck my neck out here -- all comments welcome!