Monday, January 28, 2013

Wikipedia and bibliography

One thing that I have noticed on many Wikipedia pages is that you get references, and some external links, but not what I would call a useful bibliography. There are bibliography pages, but these are usually huge, comprehensive lists of works on broad topics.

What I appreciate about Wikipedia is that it is a great place to start when you are delving into a new topic. The links between Wikipedia pages within the text can be very helpful (albeit at times a bit too much of a distraction when your curiosity takes you a great distance from where you started). I would like for at least some Wikipedia pages to serve also as a beginning bibliography for the topic.

What constitutes a beginning bibliography is obviously not easy to define, but Wikipedia has never shied away from such difficulties. When I was in college we had a separate undergraduate library that had the basic books in each field. If you'd never thought about, say, anthropology, you could find the LC class number for that topic, go to the shelf, and you'd be looking at the books that most professors teaching an undergraduate course would consider "must reads" in the field. There was even a published list, called "Books for College Libraries" that listed the key books that college libraries of various sizes should have in their collections. This is now an online resource called "Resources for College Libraries" (behind a paywall) that has over 70,000 titles in 61 different subject areas. What this means is that doing something similar in Wikipedia is neither impossible nor radical.

I got to do a small experiment in this area yesterday at a local "Wikipedia edit-a-thon." I had brought with me some books that I thought would yield interesting explorations - one of which is a marvelous book called "Woman in Science" published in 1913.  Although the author is listed as "H. J. Mozans" that turns out to be a pseudonym for John Augustine Zahm. Zahm does have a Wikipedia page, and it did list, within a paragraph, a number of books that he had written. Oddly enough, Woman in Science wasn't one of them. I added it, then decided that since he had written a handful of books that I would add a bibliography on his page. The Wikipedia bibliography format is, well, you know, like so many Wikipedia structures, something less than friendly. But I discovered something that I probably should have known.

I went to the Open Library page for the book, and near the bottom found the list of export formats.

Clicking on "Wikipedia Citation" I got this:

which can be pasted directly into Wikipedia. If you are using it for an inline citation, you need to surround this code with <ref></ref>, which will then create a number reference and will add this to the references at the bottom of the page.

Unfortunately, the Open Library code doesn't include a link to the full text, most likely because that isn't part of the Wikipedia format. To do that I added a link to the Internet Archive digitized version of the book after the citation. You can look on the Zahm page and see how that looks. (I'm still looking for a better way to format it so that the link to the full text stands out without looking ugly.)

There is another way to add bibliographic data to Wikipedia which is to click on the menu at the top of the edit window and select a citation type, which then gives you a form to fill out. But if you can find the item in the Open Library, you can avoid all of that typing.

Now that I have learned that it is easy to add bibliographic data to Wikipedia I'm interested in exploring ways that Wikipedia pages can be starting places for essential reading on topics. It naturally makes sense to point to any existing digital materials, but a next logical step would be to find a way to point to libraries for more recent (and in copyright) materials. 

Wednesday, January 16, 2013

Hackers and heroes

Recent events have led me again to a contemplation of the equation of hackers and heroes. How is it that an essentially cerebral and sedentary activity gets equated with heroics? And why computing and not, say, bioscience?

If you've read your obligatory Joseph Campbell you know that the hero myth is ubiquitous in human cultures. Each culture adds its own flavorings and decorations, but the general story is the same: a usually young, alone male goes through transformational trials, performs some task that makes a difference to the world, and is then declared a hero.

In the story-telling world, it ends there. You don't get the post-hero narrative, although, like love stories, there is an implicit "and they lived happily ever after." This makes it easy to forget that in real life "heroism" is a moment, not a lifetime. The fireman who saves the baby from the burning building, the batter who hits the World Series-winning home run: this is a moment of glory before the person goes back to being an ordinary Joe.

When Steven Levy wrote "Hackers, heroes of the computer revolution" in 1984, the hero myth was perhaps new to the computer world. By the early 1990's consumer computing magazines were full of hero and superhero images. This presents an odd contrast to the stereotype of the out-of-shape, asocial, code-writing computer geek. Since then graphics capabilities have allowed the hero myth to move to the screen in the form of first-person games in which anyone with the time and inclination can play at being a hero.

Levy's "hacker heroes" were in fact ordinary computer geeks, and not even the first. Levy focuses on a group of young men at MIT starting in the late 1950's, but they had been preceded in the 1940's and early 1950's by those who were truly the first computer programmers, many of whom were women. There was no attribution of heroics to those pioneers, neither at the time nor retrospectively. What changed?

Many of the studies of the decline of womens enrollment in computer science ask a similar question, which is: how did computer science become a male bastion when it had once seemed welcoming to women? And why did it take on a hyper-masculinized culture, with home brew, skateboarding the hallways, pizza delivered to midnight coding frenzies, and heroes?

I don't have, and have not encountered in my reading, an answer to that question. I do want to caution, however that the hero aspiration has a down side that is played out as tragedy. It might be best to limit our heroes to the mythological realm and leave computing to mortals. It just might become a friendlier place for everybody.

Wednesday, January 02, 2013

OCLC Top 50

OCLC recently released a file of 1.2 million metadata records for the most widely held items in its catalog. These are all items with 250 library holdings or more. I created a list on WorldCat of the top 50, mostly out of curiosity. I was quite surprised at the results, however.

Here's how it breaks down:
  • 16 periodicals, with Time and Newsweek being numbers 1 and 2, respectively
  • 29 kid and YA books, four of which (and very high even in this small list) from the Diary of a Wimpy Kid series
  • 5 adult books
The five adult books are:
  1. McCullough, D. G. (1992). Truman. New York: Simon & Schuster. 
  2. Brown, D. (2003). The Da Vinci code: A novel. New York: Doubleday.
  3. Johnson, S. (1998). Who moved my cheese?: An a-mazing way to deal with change in your work and in your life. New York: Putnam. 
  4. Haley, A. (1976). Roots. Garden City, N.Y: Doubleday.  
  5. Peters, T. J., & Waterman, R. H. (1982). In search of excellence: Lessons from America's best-run companies. New York: Harper & Row
This small set gives me many ideas of things to investigate in the full set. First, the monographs in this set are all recent dates, with the oldest being 1976, and most after 2000.
I am hoping to graph the full set by date. What I expect is that the items will be overwhelmingly recent publications because libraries tend to hold what people read, and my guess is that readers are mainly reading new books. Also, libraries buy from the set of things that are in print, so even if they are buying a so-called classic (as they do every time yet another movie is made of Pride and Prejudice) they are buying a current edition which will have a recent date.

The next obvious bit of information would be correlation between holdings and date, which I expect to be high for the very reasons given above.

The overall distribution of holdings is unsurprising, starting high (at almost 7000 holdings), dropping off dramatically, and creating a long tail. (I had managed to coax a chart of out ooCalc but it crashed before I captured it. Am now studying how to deal with large files and visualization. Advice gladly received.) Of course, the tail would be very, very long if you could chart the entire WorldCat database. (Anyone know how many items in WC are held by only one library? I can't find that in the available WC stats.)

I think it would be interesting to be able to analyze library holdings in correlation with the FRBR-ization that OCLC has done. In fact, I would really like to see the top 1% (or .5%) of FRBR-ized items. Related to FRBR I am mainly wondering if we can estimate how frequently FRBR might fulfill its promise of saving the time of the cataloger. But that's for another day.