Sunday, November 26, 2006

Authorities and Authors

I was reading through some chapters of Joanna Russ's book "How to Suppress Women's Writing," when I had some ideas about authority control. Russ's book is one that I re-read often. It speaks to more than just women's writing -- it is a general description of how the accomplishments of a non-dominant group in any society can be ignored or devalued. Russ mentions many women authors who never make the "top 100" list, or the "Anthology of [whatever] Literature." She also states that a majority of writers in the 19th century were women.

I immediately thought how it would be interesting to use a large database, such as LoC's file, or WorldCat, to retrieve authors either by gender or country of origin. It then occurred to me that this is information that we do not include in authority records, even though it is probably available in a majority of cases. I also recall -- although I cannot place -- a discussion about adding to the authority record for an author the names of all of the author's works. In this sense, the authority record would be more than a controlled form of the author's name, it would actually contain information about the author that would be of interest to catalog users. There is talk of adding links from author names in catalog records to their Wikipedia entries. Those entries are surely more of interest to users than the authority record, which is just a list of variant forms of the name. For example, look at the wikipedia entry for Joanna Russ, and compare that to the authority record for the same person.

So imagine an "author" record, either related to or in place of the authority record. It could help users understand who the author is, and to place the author in a historical period (even if the authoritative form of the name doesn't include dates). If coded well, a database of author records could provide some interesting information for various areas of study.

4 comments:

Anonymous said...

Hmm, if you add a list of all works to the authority record, you are _centralizing_ this function. The only way to make sure the works are listed properly is to get a new or missing work added to a centralized authority record.

Conversely, the way things are _meant_ to work now is that the authorized heading in the bib record serves as a 'foreign key' to the authority record. That information should all be there in the cataloging corpus, but it's kept as 'foreign key'--more in keeping with the principles of normalization of database design--this isn't exactly a relational database we're talking about, but the principles apply similarly. Now, to add a new book, you just need a new bib record to be present in the corpus, with the proper authorized heading 'foreign key', you don't need to touch a centralized authority record.

This is far superior. If it worked. The user could still be displayed the same information, so long as the semantic content is there--having the same authorized name 'foreign key' on all bib records _should_ be enough to record the semantic content "All works that belong to this identity." If it worked. So why isn't it working? That is left as an exersize to the reader. :)

But it is very important we start thinking in terms of what _semantic content_ we desire/require to be included in our bibliographic corpus. Yes, this determination will be guided by what functionality we want to support. But there isn't a one to one mapping from functionality to the way our data structures look. The intermediary is semantic content. It's by determining what semantic content we want to store, and then determining the most flexible, re-useable, non-duplicated, way to store it (this is the point behind the principles of normalization in db design)---that we create data structures that will in fact support functionality yet to be imagined (using the same semantic content), not just the functionality we imagine today.

Jonathan Rochkind

Karen Coyle said...

Jonathan,
I totally agree that the "author record" could work the way you describe. The astonishing fact is that we have no real link between the authority record and the bibliographic record EXCEPT the string in the 1XX field. I could go on and on about how hard it is to match between authority and bib records, especially if you want to link from all forms of the name in the authority record. Basically, in a machine-readable environment the authority record of today just doesn't work. So a new design, and you've got a good start here, could go a long way toward helping users disambiguate names in the catalog. Where two authors' names are similar, we should be able to say to the user "do you want Author A who has written these three children's books, or Author B who writes mainly articles on biodiversity?"
I believe there has been some work on disambiguating authors based on topical clusters of works. This could be used in the area of journal articles where names are often abbreviated and no one has time to do real author identification.

Anonymous said...

I agree with you in theory, but I can see how this information could be problematic. For example, I have an authority record and I'm transgendered, but that isn't terribly widespread knowledge. I am very happy with my unisex authority record and would hate to have that change!

Anonymous said...

A Celebration of Women Writers (edited by my wife, Mary Mark Ockerbloom) works somnewhat like you describe. Obviously, all the writers in her database are women, and she includes variant names, countries they're associated with, and pointers to information about the authors.

It's linked to my own Online Books Page database via a shared identifier (not visible to most users), which then enables readers to see what books are online by the writers in question.

Mary's database isn't in sync with LC name authorities, but my bib database is syncing up with it (not all names are authority-controlled yet, but many are), so hers could be too via the shared identifier.