Wednesday, June 19, 2013

Spying, the old-fashioned way

While the news debates the NSA's PRISM program, a massive collection of data points of electronic communication, the more human side of spying is being pushed to the background. Yet if you are fearful of privacy invasion, there is nothing more chilling than a reading of FBI files with accounts of informants and statements about "Communist leanings" and "pro-Russian" attitudes. You can get a taste for this in the FBI Vault, a public file of de-classified documents, most of which are revealed upon the death of the target. The A-Z list (which appears to be a selection, as other names can be found with searches) is full of famous names, from Al Capone to Al Gore, from George Burns to Marilyn Monroe, and from Helen Keller to Leon Trotsky. (The list is only marginally alphabetical, by first name, which is almost as shocking as the contents of the files.)

This is old-fashioned stuff for the most part. Type-written letters, lots of scribbled initials, and whole chunks of documents blacked out with what must be a special FBI-invented marker.


On a more modern note, who could resist adding their favorite FOIA file to Facebook?



Not everyone in the Vault is a potential "enemy of the state." Some are there because they were threatened, and the FBI was doing its "protect and serve" job. But coming to the attention of the FBI is often not a good thing. In the case of Bradbury, an informant tipped off the Bureau that Bradbury may have attended a writers' meeting in Cuba. This set off an investigation.
"Investigation conducted in the neighborhood of 10265 Cheviot Drive, Los Angeles, California, disclosed that a RAY DOUGLAS BRADBURY, date of birth 8/22/20, Waukegan, Illinois, resides at this address. He is a known writer and Los Angeles indices have numerous references on RAY DOUGLAS BRADBURY."
The report is itself a boon for any biographers (and Wikipedians). It gives his family history (back to 1630), location and occupation of all living family members, information about his spouse (including the location of the church where they were married), and of course his yearly income. Given that this report is from 1959, you can just imagine how much more information the FBI would have today. There are whole pages that would do a reference librarian proud: a list of his professional memberships, a complete bibliography, film credits. There is even some level of literary analysis:
"... BRADBURY was probably sympathetic with certain pro-Communist elements in the [Writers Guild of America, West]... [Informant] stated it has been his observation that some of the writers suspected of having Communist backgrounds have been writing in the field of science fiction and it appears that science fiction may be a lucrative field for the introduction of Communist ideologies."
Admittedly, this was high "red scare" time still, and Bradbury was working in and around Hollywood. However, the informant seems to have been unfamiliar with the work of L. Ron Hubbard.

Every file here has some gems worth reading. And don't forget to check the category "Unexplained Phenomenon".

Tuesday, May 14, 2013

BIBFRAME Authorities

There is a discussion taking place on the BIBFRAME listserv about the draft proposal for BIBFRAME Authorities. I've made some comments, but this is a topic that requires diagrams, and therefore doesn't work well in email. This blog post is an illustrated comment on the BIBFRAME Authorities proposal.

The way I read the proposal, this diagram represents the current thinking on BIBFRAME Authorities:

Here is an example of a BIBFRAME authority representation from the document:

<!--  BIBFRAME Authority -->
<Person id="http://bibframe/auth/person/franklin">
      <label>Franklin, Benjamin, 1706-1790</label>
      <hasIDLink resource="http://id.loc.gov/authorities/names/n79043402" />
      <hasVIAFLink resource="http://viaf.org/viaf/56609913" />
      <hasDNBLink resource="http://d-nb.info/gnd/118534912" />
</Person>
 
It is unclear to me what role or functionality the VIAF and DNB links are expected to have, so that is a question that I have. I don't know what "hasIDLink" means - whether that is specific to LCNA or means: "this is the authority file." If it does not mean that, then this does not link the BIBFRAME name display form specifically to the actual authority file that defined it. If it does mean that, then the three authority files are not treated consistently.

In addition, it does not appear that alternate name forms are including in the BIBFRAME Authority, so they are not available for indexing. That could just be something missing from the examples, however.

It would make more sense to me that if a BIBFRAME authority is needed in the BIBFRAME structure, to make a few changes. First, the alternate name forms would be included in the BIBFRAME authority, primarily for indexing. The preferred form of the name is obviously there for the purposes of display and indexing. The alternate forms are not displayed, but should be used in retrieval.

Another possible change is to make a direct link from the BIBFRAME authority to the library authority entry, in this case LCNA. Without this, it isn't clear how the two will be kept in sync as the LCNA file is updated. Links to other library authority files would be from the authority of record, which is what they are "nearly equivalent" to:
Note that this still links the annotations to the BIBFRAME authority. Other libraries using the LCNA data would not necessarily have access to annotations linked directly to the BIBFRAME authority, but that depends on how those authorities are shared. The advantage of this is that it shelters the "true authority" from possibly inappropriate stuff that might be associated with the BIBFRAME authority.

<!--  BIBFRAME Authority -->
<Person id="http://bibframe/auth/person/franklin">
      <label>Franklin, Benjamin, 1706-1790</label>
      <altLabel>Franklin, V. (Venīamin), 1706-1790</altLabel>
       <authority resource="http://id.loc.gov/authorities/names/n79043402" />
</Person>
 
<!-- LC Name Authority -->
 
  <madsrdf:PersonalName rdf:about="http://id.loc.gov/authorities/names/n79043402">

      <madsrdf:authoritativeLabel xml:lang="en">Franklin, Benjamin, 1706-1790
        </madsrdf:authoritativeLabel>
      <madsrdf:variantLabel xml:lang="en">Franklin, V. (Venīamin), 1706-1790
        </madsrdf:variantLabel> 
      <hasVIAFLink resource="http://viaf.org/viaf/56609913" />
      <hasDNBLink resource="http://d-nb.info/gnd/118534912" /> 
  </madsrdf:PersonalName>

The last option that I can propose is that of simply using the library authority. I believe that the argument against this is that such data may not always be available for record displays but as far as I know nothing prevents caching of high-use metadata statements ("triples" because it's all just triples after all), and refreshing these periodically to make sure one has the latest. In fact, it is probable that the linked data space will take a lesson from the Domain Name System, where a system of mirrors and backups distributes the DNS world-wide, syncs changes, and provides almost 100% availability. In that case, there would be no reason not to use ones' stated authority, with similarly coded local data existing where the authority data does not exist for the occasional local need.
<!--  BIBFRAME Authority -->
<Person id="http://id.loc.gov/authorities/names/n79043402"></Person>
 
<!-- LC Name Authority -->
 
  <madsrdf:PersonalName rdf:about="http://id.loc.gov/authorities/names/n79043402">
      <madsrdf:authoritativeLabel xml:lang="en">Franklin, Benjamin, 1706-1790
        </madsrdf:authoritativeLabel>
      <madsrdf:variantLabel xml:lang="en">Franklin, V. (Venīamin), 1706-1790
        </madsrdf:variantLabel> 
      <hasVIAFLink resource="http://viaf.org/viaf/56609913" />
      <hasDNBLink resource="http://d-nb.info/gnd/118534912" /> 
  </madsrdf:PersonalName>

To be sure, I am making some assumptions that should be explicit.
  1. It's all triples. It's easy to forget this when looking at graphs.
  2. Availability is a technical issue for which there is an answer (or more than one answer)
  3. The main action in a linked data space is a query. This is not only for traditional discovery, but also for forming displays. To display a person's name will be a query for a "label" linked to a URI. It doesn't matter whether the URI is a BIBFRAME authority URI, an LCNA URI, or a DNB URI -- each of those is a triple in linked data space.

Friday, April 05, 2013

The "Mellen Mess" and the changing role of publishers

Reading about the "Mellen Mess" -- the case of the publisher that is suing a librarian who criticized the quality of the houses's output -- I found the most interesting discussion to have taken place in the comments area of the original post (available via the Wayback Machine). One poster says:
On the other hand, I would say that few if any publishers do not publish a number of books that I would not buy.
To which Dale Askey replies:
The fact is, however, that libraries have to be able to trust presses to turn out good titles, or our work becomes impossible given the sheer global output of scholarship... libraries lack enough qualified subject expertise to make such judgments at the necessarily granular level, and the trend here is not encouraging. Subject librarianship is dismissed as a relic of a past age, and we now talk about “patron-driven” acquisition as if it were the Holy Grail. Having spent a brief but wonderful portion of my career as a focused subject librarian for an area where I have expertise, I know the benefit of reading substantive reviews and making intelligent choices about individual titles, but even that library no longer has the funds (or perhaps just lacks the will to commit the funds) for such esoteric enterprises.
What I think we see here is evidence of a substantial change in what it means to be a publisher in this age of "everyone can be a publisher." First, a little history.

Turin book fair, 2007
The first followers of Gutenberg were equal parts scholar, technician and businessman. There was never any question that producing print was a for-profit activity, and the same printers who turned out carefully edited classics also printed the first advertisements as well as a large number of indulgences to be sold to wealthy (but not well-behaved) Catholics. Well into the late 19th century, publishers were also printers, and often saw themselves as having a key role in scholarship and culture. The reputation of the publisher was what made the introduction of new, unknown authors possible.

Turin book fair, 2007
Although I am at my very core a "book person," I was unaware of the culture of publishers before visiting Europe and attending both bookstores and a few book fairs there. What struck me immediately was that the book covers represented the publisher more than the book itself. Near a university I found a bookstore that was entirely organized by publisher -- not by topic -- so that the only access other than "known item" was browsing by publisher.

 By my own observation, by the 1950's the role of the publisher in the US was subordinated to the book, preferably a best-seller. We could all name key books (Catcher in the Rye, To Kill a Mockingbird, The Spy Who Came in from the Cold), but I doubt if many of us could name the publishing house that issued them.

As Epstein and Schiffrin explain (see Further Reading), the purchase of publishing houses in the late 20th century by companies with a primary interest in profits, unhindered with cultural concerns, has made the publishing house no more than another business. From scholar-printer-businessman, only the latter role remains. If "best-selling" is your idea of quality, then these publishers can be considered consistent and trustworthy. If you are looking for greater cultural pursuits, you will probably be disappointed.

While that describes popular publishing, scholarly publishing has retained the publisher reputation... at least until very recently. While there still are known scholarly publishers whose output can be trusted sight un-seen (as Askey explains), there are many new entrants to this business area whose primary goal is income, not scholarship itself. This seems to be following a similar path to that of popular publishing, but with a twist: scholars must publish. The real culprit in this story is the "publish or perish" culture of academia. It matters not that there is no audience for a scholar's work; in fact, being actually read is rather icing on the cake. The main thing is that a scholar must get his or her work produced by someone acting as a publisher. It is therefore unremarkable that publishers have come on the scene to address this market.

The big "however" here is that while author fees may cover the cost (plus profit) of publishing an open access article, printed books still need to have some sales. Throughout the history of publishing, vanity books have been known as money-losers,* and some publishers have contracted with the authors to buy back any un-sold copies. This is more than an un-tenured faculty member can afford, however, so the business of publishing books by academics is one that wise investors would avoid.

The upshot of the story here is that we've gotten ourselves into an untenable position between the pressure to publish and the actual market for published works. Something has to give, and it has to give at both ends of the equation.

The next step, then, is improving the social media that the academic community uses so that the "post publication peer review" becomes the filter for quality and importance. 

---------------

* I ran into a great rant by a 19th c. Italian publisher about vanity publishing while doing research on Natale Battezzati. I unfortunately didn't mark it, but if I find it again I will link it here.


Further Reading

Epstein, Jason. Book Business: Publishing Past, Present, and Future. New York: W.W. Norton, 2001.

Schiffrin, André. The Business of Books: How International Conglomerates Took Over Publishing and Changed the Way We Read. London: Verso, 2000.

Saturday, March 30, 2013

By way of explanation

“Readers who are familiar with conventional logical semantics may find it useful to think of RDF as a version of existential binary relational logic in which relations are first-class entities in the universe of quantification. Such a logic can be obtained by encoding the relational atom R(a,b) into a conventional logical syntax, using a notional three-place relation Triple(a,R,b); the basic semantics described here can be reconstructed from this intuition by defining the extension of y as the set { : Triple(x,y,z)} and noting that this would be precisely the denotation of R in the conventional Tarskian model theory of the original form R(a,b) of the relational atom. This construction can also be traced in the semantics of the Lbase axiomatic description.”
        From the RDF Semantics document

"Doubts about the ability to know the order of the world catalyzed a crucial change, away from taxonomic forms of information storage based on natural language and toward new ones based on a symbolic language of analytical abstraction. Mathematics promised a new vision of order for both the natural and the moral worlds, where confusion was resolved by jettisoning whatever could not be known with certainty."
     Hobart, Michael E. Information Ages: Literacy, Numeracy, and the Computer Revolution. Baltimore: Johns Hopkins University Press, 1998. p. 90

“We could try to feed it algorithms for everything. There are only slightly more of them than there are particles in the universe. It would be like building a heart muscle molecule by molecule. And we’d still have a hell of an indexing and retrieval problem at the end. Even then, talking to such a decision tree would be like talking to a shopping list. It’d never get any smarter than a low-ranking government bureaucrat.”
     Richard Powers, Galatea 2.2, 1st Perenniel Ed., 1996. p. 78


Thursday, March 14, 2013

Battezzati's Cartollini

Beginning with the first edition of the Dewey Decimal System and Relativ Index, Melvil Dewey includes this intriguing acknowledgment:
Perhaps the most fruitful source of ideas was the Nuovo sistema di Catalogo Bibliografico Generale of Natale Battezzati, of Milan. Certainly he [Dewey] is indebted to this system adopted by the Italian publishers in 1871, though he has copied nothing from it.
It so happens that I did some research on this in the national library in Milan in the mid-1970's and never published what I learned. This blog post makes use of notes and photocopies from that time.

There are a number of puzzling things about Dewey's mention of Battezzati's system. One is that it had little or nothing to do with classification. It was, however, an ingenious card system. The story, in brief, goes like this:

Natale Battezzati was a printer/publisher in Milan from the mid-1800's onward. The publishers had a bi-monthly publication that carried information on new books in print. The publication was used by booksellers and customers to find books of interest. However, unless a bookseller had a perfect memory, looking for a specific book or a book on a specific topic meant combing through numerous back issues of the Bibliografia italiana. Battezzati, a member of the Associazione libreria italiana (the Italian Bookseller's Association) came up with the idea of using reprints of the title pages of books on card stock that could be kept and interfiled as a kind of "books in print" card catalog within each bookstore.

The genius of the card system was that each card had printed on each of three sides:
  1. the name of the publisher 
  2. the name of the author
  3. a subject classification based on Brunet
The cards could also have overprinted a table of contents or a summary of the contents. Each publication was also to be given, in the upper right corner, a number that could be used by the bookstores in ordering.

Thus, the bookseller would receive three copies of the card for each new book, and could create three card files. Battezzati's purpose was to increase sales by making it easier for a bookseller to satisfy the needs of the customer.

So much of this seems familiar to us today: a single "unit" card with multiple headings, a unique numbering system for books ... it's no wonder that Dewey was impressed, but it's still unclear why a reference to the system would be included in all fourteen editions of DDC that Dewey personally oversaw.

Of even greater mystery is a statement by Battezzati in one number of the association's journal that Dewey, sent by his government to the World Exposition in Vienna in 1873, saw the cards demonstrated there. This is probably a mis-interpretation by Battezzati of a letter sent to him by Dewey, since it is highly unlikely that Dewey, at the time a 22-year-old college student, would have been sent to represent the United States at such an event in Europe. It is more likely that Dewey saw the cards in the articles in the Bibliografia italiana, which was held by a few major libraries on the East coast (Dewey was attending Amherst College in 1873), such as the Boston Atheneum. There were other misunderstandings on Battezzati's part, since he referred to Dewey as the secretary of the "Associazione dei Libraj d'America" -- that is, the Association of American Booksellers. 


Monday, March 04, 2013

Sergei Brin's Masculinity

At first I thought it was a joke: "Speaking at the TED Conference today in Long Beach, Calif., Brin told the audience that smartphones are "emasculating." "You're standing around and just rubbing this featureless piece of glass," he said."  Perhaps I didn't believe it was true because I first encountered it in the form of a BoingBoing parody for "Mandroid: Google's remasculating new operating system." Another one of those moments when reality and parody are just soooooo close.

The Ted talk won't be available for while so I don't know if he said this with any hint of humor. (I rather hope so, but I fear not.) The talk was about the Google Glass product, which he was demonstrating and promoting.  But even if he meant the statement as something of a joke, there are things that need to be said about the not-so-sub text.

1. Using "emasculating" to deride a competitor's product when neither product has anything to do with gender is just a cheap shot. It's like Coke saying that Pepsi is "emasculating."

2. The ongoing attempt to raise the testosterone levels of electronic equipment has gotten out of hand. Yet, unfortunately, products must make an appeal to identity in order to sell. Apple pushes an identity of design and sophistication that was once considered "un-manly" by early Mac reviewers. Brin's remark, albeit nonsensical, pushes back against Apple's more gender-neutral image.

3. It makes little sense to eliminate women from your market, and promoting a product as a kind of "technology viagra" is not going to win over female consumers. Brin's remark shows that he's more concerned with promoting a masculine image that he is comfortable with than with following good marketing practice.

Some reading:

Wikipedia Women in Computing
Gender codes: why women are leaving computing edited by Thomas Misa
How to market to women, by Carol Nelson (1994, so a little out of date, but still useful)

Thursday, February 21, 2013

Open to Creativity

The brilliance of Google's PageRank is not the computational methods behind it but the target of those methods: links created by people in the course of making something meaningful on the Web. Without that human input, Google (and Bing and Yahoo) would be simply counting up term frequencies and perhaps analyzing linguistic characteristics, but missing most of what makes Web searching work. The results would be at least as bad as the ranking on Google Books because it would be devoid of the human commitment of significant connections between the pages.

Although Google's mission statement is "... to organize the world's information and make it universally accessible," Google is not really organizing anything. It is reading the organization provided by the Web's population. Similarly, Facebook is reading the relationships between people that are made in the course of using its software, information that it would not have otherwise.

What has made the web the rich environment that it is is that anyone can link anything to anything else. That linking is an expression; and even though we might not be able to characterize it in a few words (what does it mean that page A links to page B?) Google has shown that we can make use of those patterns of linking to help people find stuff on the Web that meets their needs.

In this regard, the Semantic Web has a serious problem. Much of the focus of Semantic Web work is the creation of vocabularies of defined relationships between things, with the intention that these relationships can be traversed and manipulated by algorithms. That is fine in itself, but the Semantic Web enthusiasts are primarily creating pre-determined, fixed relationships between things, mainly based on defining each thing in a class/sub-class relationship with other things. The Semantic Web vocabularies also define requirements called "range" which limit what you can link to from a specific element of your vocabulary. [1]

The tendency to pre-define vocabularies with strict rules is a carry-over from the Artificial Intelligence (AI) environment from which the Semantic Web arose. If you expect to work with machines, and only machines, then you have to define for them exactly what they can and cannot do, and you must present all decisions as formulas that can be calculated.

For reasons that is unclear to me, the Semantic Web work seems to be unaware of the great success that Google has had in using human information sharing activity to creating a meaningful web of links. There is little attention to the fact that establishment of relationships through linking can be done by humans, much less that the best and most useful linking will be done by humans. Human information linking is not definable a priori -- in fact, an a priori definition of allowed links essentially limits the future to re-running past concepts in new variations. It is an absolute barrier to creativity if you can only act on what has been pre-defined.

It's the difference between a prefab house, which can only become what it was designed to become (with some minor modifications that don't change its essence) and a box of blocks that can be used to create anything within the realm of physics (although superglue could extend the possibilities).

In part this is why I tend to speak of "linked data" rather than "Semantic Web." Linked data, as it has evolved as the description of metadata activity, carries less of the AI baggage than Semantic Web does.

To me what will be exciting about linked data is what people will do with it; what they will create, what they will experiment with, and both successes and failures. However, for people to do something with linked data they need tools -- tools that are as easy to use as those used to create Web pages and the links between them.

They also need a box of blocks to work with. These blocks need to be as free of predetermined rules as possible. [2] The terms need to be defined just enough to make them usable. This is also true of the relationships or links. Using our box of blocks metaphor, we want to be able to put the blocks in relationships like "above" "below" "near" each other. If the square blocks are defined as always "below" the rectangular blocks, then that limits what we can create.

The Semantic Web as a machine environment guided with AI formalities appeals to some because it promises to be neat and unambiguous. It will, however, foster only a very constrained amount of creativity, and will not be able to satisfy the full range of human curiosity.

It is a shame that many Semantic Web enthusiasts have little faith (or little interest) in the human potential that linking and openness can unleash. I, for one, am looking for partners in the development of a messy, intelligent, quirky, technology that can produce surprising results, created by people using linked data as a tool of expression. I am particularly exciting by the fact that we don't know what forms that expression will take.



[1] For example, you can define a creator has having to be type "Person" and that Person must be expressed as a URI, like http://kcoyle.net/kcoyle.rdf. In that case, you can't have a creator that is "Karen Coyle" because "Karen Coyle" is just a string of characters, not an identified entity. This means that if you don't have identifiers for your creators, you can't create data about what they created.]

[2] This is referred to as "minimum ontological commitement," introduced in Toward Princiciples for the Design of Ontologies Used for Knowledge Sharing, by T Gruber [http://www2.iath.virginia.edu/time/readings/ontology-semantics-metaphor/designing-ontologies.pdf]