Tuesday, July 30, 2013

Wikipedia as a learning experience

I have recently attended a few Wikipedia editing sessions and become interested in contributing more to Wikipedia. There is much editing to be done on pages relating to libraries and librarians; some of those pages are quite inadequate, and many have been marked as such using the Wikipedia coded messages that point out problems. The page for the LCCN is a stub, for example. Search on Sears Subject headings and what you get is a pretty poor page for Minnie Earl Sears with some information about the subject headings. Lately I've been updating the page on the Dewey Decimal Classification, which had little background information and did not have appropriate citations. I hope to move from there to the rather strange page that "compares" DDC and the LC Classification.

I estimate that I spent between 20 and 40 hours doing the research for my updates to the DDC page. The reason for that is that the Wikipedia standard requires that all facts be sourced. Add to that the requirement for a neutral point of view (called NPOV in wiki-speak), and a good Wikipedia page is a set of sourced facts, with some clear writing connecting them. (And, yes, there are a lot of not-good Wikipedia pages.)

It occurred to me that if I were a teacher I could use Wikipedia as a learning experience. Wanting your favorite topic to be well-represented in Wikipedia is a great motivator. Having to source all of your facts (and being pretty much limited to facts) means having to do research. Doing research becomes a good activity for discussing how to find sources and how to evaluate them.

Then I thought: wouldn't it be great to run a Wikipedia editing session in a library? What better place to have access to the sources? An editing session in a library with reference librarians on hand sounds like a Wikipedian's dream, and it could be used to teach people how to use the library.

Have you done this? I'd like to know.

Tuesday, July 23, 2013

Linked Data First Steps & Catch-21

Often when I am with groups of librarians talking about linked data, this question comes up:
"What can we do TODAY to get ready for linked data?"
It's not really a hard question, because, at least in my mind, there is an obvious starting point: identifiers. We can begin today to connect the textual data in our bibliographic records with identifiers for the same thing or concept.

What identifiers exist? Thanks to the Library of Congress we have identifiers for all of our authority controlled elements: names and subjects. (And if you are outside of the US, look to your national library for their work in this area, or connect to the Virtual International Authority File where you can.) LoC also provides identifiers for a number of the controlled lists used in MARC21 data.

The linked data standards require that identifiers be in the form of an HTTP-based URI. What this means is that your identifier looks like a URL. The identifier for me in the LC name authority file is:
Any bibliographic data with my name in a field should also contain this identifier. (OK, admittedly that's not a lot of bib data.) That brings us to "Catch-21" -- the MARC21 record. Although a control subfield was added to MARC21 for identifiers ($0), that subfield requires the identifier to be in a MARC21-specific format:
The control number or identifier is preceded by the appropriate MARC Organization code (for a related authority record) or the Standard Identifier source code (for a standard identifier scheme), enclosed in parentheses.
The example in the MARC21 documentation is:
100 1#$aBach, Johann Sebastian.$4aut$0(DE-101c)310008891
Modified to use LC name authorities that would be:
 100 1#$aBach, Johann Sebastian,$d1685-1750$0(LoC)n79021425
The contents of the $0 therefore is not a linked data identifier even in those instances where we have a proper linked data identifier for the name.  Catch-21. I therefore suggest that, as an act of Catch-21 disobedience, we all declare that we will ignore the absurdity of having recently added an anti-linked data identifier subfield to our standard, and use it instead for standard HTTP URIs:
100 1#$aBach, Johann Sebastian,$d1685-1750$0http://id.loc.gov/authorities/names/n79021425
Once we've gotten over this hurdle, we can begin to fill in identifiers for authority-controlled elements. Obviously we won't be doing this by hand, one record at a time. This should be part of a normal authority update service, or it may be feasible within systems that store and link national authority data to bibliographic records.

We should also insist that cataloging services that use the national authority files begin to include these subfields in bibliographic data as it is created/downloaded.

Note that because the linked data standard identifiers are HTTP URIs, aka URLs, by including these identifiers in your bibliographic data you have already created a link -- a link to the web page for that person or subject, and a link to a machine-readable form of the authority data in a variety of formats (MARCXML, JSON, RDF, and more). In the LC identifier service, the name authority data includes a link to the VIAF identifier for the person; the VIAF identifier for some persons is included in the Wikipedia page about the person; the Wikipedia identifier links you to DBpedia and the DBpedia identifier is used by Freebase ...

That's how it all gets started, with one identifier that becomes a kind of identifier snowball rolling down hill, collecting more and more links as it goes along.

Pretty easy, eh?

Sunday, July 21, 2013

Librarians and the JK Rowling Effect

I'm sure that by now you've heard the story: a book by unknown author Robert Galbraith got good reviews but made only modest sales, until it was revealed that Galbraith was a pseudonym for JK Rowling. Within days it was "#1 with a bullet" on Amazon.

The book had reportedly sold only 500 copies in the US. The publisher most likely did not do a large print run. The hard copy has not yet made the New York Times best seller list, which is determined by sales. However, it is #1 on Amazon due to the infinite expandability of Kindle ebook copies.

This may be a kind of watershed moment for ebooks, a proof that in this world of instant access the ebook is not only good for readers (as in humans who read), but they have definite advantages for publishers. The primary message here, though, is that reputation sells. This is something that advertisers have known forever. Therefore I was surprised to come upon a short article in The Nation magazine from 1897(*)  that blames librarians for making authors famous by naming them and clearly disdains this fact.
"The role of the librarians in this country as critics of literature and arbiters of literary reputation is growingly apparent. 'Poole's Index to Periodical Literature' is of necessity selective, and the selection from each periodical embraced in the Index appertains to the particular librarian or library assistant specially charged with the care of that periodical. In most cases, the name of the author is ascertained and appended to the title, and so the aristocracy of current letters is called into being. Writers in this way come to be known for their range of subjects and interests; their weight is suggested by frequency of titles; editors and publishers will naturally apply to them as authorities."
Not only did librarians select literature for the index, they actually created recommended bibliographies!

"Not satisfied with this control at once of fame and research, the associated librarians got up a list of works recommendable for a library of five thousand volumes... Mr. Melvil Dewey... 'submitted to the librarians of the State [of New York] and others to obtain an expression of opinion respecting the best fifty books ... to be added to a village library."
The author of the piece (not named, by the way, at least not in the pages that I retrieved) saw this as "a pretty generous advertisement." My concern about the influence of libraries and librarians is quite different: access to knowledge determines future knowledge. Well, perhaps "determines" is too strong of a word, but let's say that what we can produce as new knowledge depends greatly on which giants are available to us, in the Newtonian sense of "standing on the shoulders of giants."

Oftentimes, the library has a role in providing those giants. In a small library, what the library owns will necessarily be a subset of the knowledge on a topic. In a large library, where the number of documents on a topic is way beyond the capability of most researchers to absorb, the organization of the materials will determine what researchers discover. Even if collection development were a perfect process, with unlimited funds, unlimited space, and absolute neutrality, the library in some way has an effect on future knowledge.

In that sense, Amazon and other booksellers have it easy: all that matters for them is the simple measure of sales. They don't have to, and probably do not, wonder if the world is or is not a better place now that we have Fifty Shades of Dubious Personal Interaction, and a new JK Rowling bestseller. Ironically, counting sales or hits or links is considered "neutral" while attempting to make a selection of the most important works in a subject area within a limited budged is looked at askance.


* The Nation. vol. 64, no. 1663, May 13, 1897, p. 860