Coyle's InFormation

Monday, March 04, 2013

Sergei Brin's Masculinity

At first I thought it was a joke: "Speaking at the TED Conference today in Long Beach, Calif., Brin told the audience that smartphones are "emasculating." "You're standing around and just rubbing this featureless piece of glass," he said." Perhaps I didn't believe it was true because I first encountered it in the form of a BoingBoing parody for "Mandroid: Google's remasculating new operating system." Another one of those moments when reality and parody are just soooooo close.

The Ted talk won't be available for while so I don't know if he said this with any hint of humor. (I rather hope so, but I fear not.) The talk was about the Google Glass product, which he was demonstrating and promoting. But even if he meant the statement as something of a joke, there are things that need to be said about the not-so-sub text.

1. Using "emasculating" to deride a competitor's product when neither product has anything to do with gender is just a cheap shot. It's like Coke saying that Pepsi is "emasculating."

2. The ongoing attempt to raise the testosterone levels of electronic equipment has gotten out of hand. Yet, unfortunately, products must make an appeal to identity in order to sell. Apple pushes an identity of design and sophistication that was once considered "un-manly" by early Mac reviewers. Brin's remark, albeit nonsensical, pushes back against Apple's more gender-neutral image.

3. It makes little sense to eliminate women from your market, and promoting a product as a kind of "technology viagra" is not going to win over female consumers. Brin's remark shows that he's more concerned with promoting a masculine image that he is comfortable with than with following good marketing practice.

Some reading:

Wikipedia Women in Computing
Gender codes: why women are leaving computing edited by Thomas Misa
How to market to women, by Carol Nelson (1994, so a little out of date, but still useful)

Thursday, February 21, 2013

Open to Creativity

The brilliance of Google's PageRank is not the computational methods behind it but the target of those methods: links created by people in the course of making something meaningful on the Web. Without that human input, Google (and Bing and Yahoo) would be simply counting up term frequencies and perhaps analyzing linguistic characteristics, but missing most of what makes Web searching work. The results would be at least as bad as the ranking on Google Books because it would be devoid of the human commitment of significant connections between the pages.

Although Google's mission statement is "... to organize the world's information and make it universally accessible," Google is not really organizing anything. It is reading the organization provided by the Web's population. Similarly, Facebook is reading the relationships between people that are made in the course of using its software, information that it would not have otherwise.

What has made the web the rich environment that it is is that anyone can link anything to anything else. That linking is an expression; and even though we might not be able to characterize it in a few words (what does it mean that page A links to page B?) Google has shown that we can make use of those patterns of linking to help people find stuff on the Web that meets their needs.

In this regard, the Semantic Web has a serious problem. Much of the focus of Semantic Web work is the creation of vocabularies of defined relationships between things, with the intention that these relationships can be traversed and manipulated by algorithms. That is fine in itself, but the Semantic Web enthusiasts are primarily creating pre-determined, fixed relationships between things, mainly based on defining each thing in a class/sub-class relationship with other things. The Semantic Web vocabularies also define requirements called "range" which limit what you can link to from a specific element of your vocabulary. [1]

The tendency to pre-define vocabularies with strict rules is a carry-over from the Artificial Intelligence (AI) environment from which the Semantic Web arose. If you expect to work with machines, and only machines, then you have to define for them exactly what they can and cannot do, and you must present all decisions as formulas that can be calculated.

For reasons that is unclear to me, the Semantic Web work seems to be unaware of the great success that Google has had in using human information sharing activity to creating a meaningful web of links. There is little attention to the fact that establishment of relationships through linking can be done by humans, much less that the best and most useful linking will be done by humans. Human information linking is not definable a priori -- in fact, an a priori definition of allowed links essentially limits the future to re-running past concepts in new variations. It is an absolute barrier to creativity if you can only act on what has been pre-defined.

It's the difference between a prefab house, which can only become what it was designed to become (with some minor modifications that don't change its essence) and a box of blocks that can be used to create anything within the realm of physics (although superglue could extend the possibilities).

In part this is why I tend to speak of "linked data" rather than "Semantic Web." Linked data, as it has evolved as the description of metadata activity, carries less of the AI baggage than Semantic Web does.

To me what will be exciting about linked data is what people will do with it; what they will create, what they will experiment with, and both successes and failures. However, for people to do something with linked data they need tools -- tools that are as easy to use as those used to create Web pages and the links between them.

They also need a box of blocks to work with. These blocks need to be as free of predetermined rules as possible. [2] The terms need to be defined just enough to make them usable. This is also true of the relationships or links. Using our box of blocks metaphor, we want to be able to put the blocks in relationships like "above" "below" "near" each other. If the square blocks are defined as always "below" the rectangular blocks, then that limits what we can create.

The Semantic Web as a machine environment guided with AI formalities appeals to some because it promises to be neat and unambiguous. It will, however, foster only a very constrained amount of creativity, and will not be able to satisfy the full range of human curiosity.

It is a shame that many Semantic Web enthusiasts have little faith (or little interest) in the human potential that linking and openness can unleash. I, for one, am looking for partners in the development of a messy, intelligent, quirky, technology that can produce surprising results, created by people using linked data as a tool of expression. I am particularly exciting by the fact that we don't know what forms that expression will take.

[1] For example, you can define a creator has having to be type "Person" and that Person must be expressed as a URI, like http://kcoyle.net/kcoyle.rdf. In that case, you can't have a creator that is "Karen Coyle" because "Karen Coyle" is just a string of characters, not an identified entity. This means that if you don't have identifiers for your creators, you can't create data about what they created.]

[2] This is referred to as "minimum ontological commitement," introduced in Toward Princiciples for the Design of Ontologies Used for Knowledge Sharing, by T Gruber [http://www2.iath.virginia.edu/time/readings/ontology-semantics-metaphor/designing-ontologies.pdf]

Wednesday, February 06, 2013

Book people v. article people

I am definitely a book person. When I want to learn about something, I want to read hundreds of pages about it. I have a half dozen books on copyright, more than a dozen about the social "questions" around the Internet, a handful on the Semantic Web, two shelves of books on libraries (history, cataloging, theory of knowledge organization), and now four books on cognitive science and the theories around concepts.

I've done work with people who are definitely "article people." Mostly academics, these folks rarely delve into a book since their scholarly conversation takes place in articles published in journals. My guess is that once you reach the level of knowledge that these folks possess, the breadth of a book contains nothing new and all of the interesting stuff comes out in article-sized chunks.

I also like to follow-up on my reading. When a bit of reading focuses around a place, like Bletchley Park or Los Alamos (as recent books on computer history do) I have to find them on a map. Concepts mentioned but not covered in detail require a visit to Wikipedia. I hunt down works cited in particularly intriguing passages. And it is in the midst of this last activity that I run into perhaps a hint about my attraction to books.

Because I am "unaffiliated" with an institute of higher education, it is easier for me to obtain books than to obtain articles. Books are available used or new in an open marketplace, and I find it to be rare that there is reference to a book that I cannot get at what seems to me to be a reasonable rate. But when I look up an article that I might be interested in, I often get something like:

Yes, Wiley asks for $29.95 for an article, and JSTOR asks $38.00. I have seen these prices on articles as short as six pages. These prices are for the download of a PDF file, not an offprint to be delivered by express mail. I can only assume that they have no desire to sell access to individual articles, because the pricing is so out of whack with retail publishing. Remember, these are academic articles that quite frankly haven't a large audience. But they already exist in PDF and are available to members of subscribing institutions. In a world of $.99 pop songs and $9.99 best-selling e-books, these prices are just absurd.

One of the books I am reading at the moment is a compilation of essays called "Concepts: core readings." At least five of the essays were previously published in journals and when I looked them up the download price was $39.95 each. That's about $200 for 100 pages of a 650-page book that retails for $55.

If we want "equal access to information," as we often claim we librarians do, then we need to do something about journal article pricing. I'd be quite willing to pay $2-$4 for an article, but the $30-$40 price range is ridiculous. I'm sure that these journal companies sell very few, if any, full-price articles. As we've seen with other media, when the price is right, it becomes as convenient to pay the price as it is to bother to pirate the materials (which in my case means borrowing someone's academic identity). Surely selling zero articles at $39.95 isn't better than selling a handful of articles at $2 each.

It's great that JSTOR is now offering some articles for free (although I have yet to be able to create an account since their site just hangs when I try), and I wouldn't suggest that JSTOR should be providing an entirely free service, since they have expenses. But $38 for an article is not just too much, it is prohibitive, and it unnecessarily creates an inequality of access. Someone needs to do to the journal publishers what Apple did to the music industry: show them the money.

Monday, January 28, 2013

Wikipedia and bibliography

One thing that I have noticed on many Wikipedia pages is that you get references, and some external links, but not what I would call a useful bibliography. There are bibliography pages, but these are usually huge, comprehensive lists of works on broad topics.

What I appreciate about Wikipedia is that it is a great place to start when you are delving into a new topic. The links between Wikipedia pages within the text can be very helpful (albeit at times a bit too much of a distraction when your curiosity takes you a great distance from where you started). I would like for at least some Wikipedia pages to serve also as a beginning bibliography for the topic.

What constitutes a beginning bibliography is obviously not easy to define, but Wikipedia has never shied away from such difficulties. When I was in college we had a separate undergraduate library that had the basic books in each field. If you'd never thought about, say, anthropology, you could find the LC class number for that topic, go to the shelf, and you'd be looking at the books that most professors teaching an undergraduate course would consider "must reads" in the field. There was even a published list, called "Books for College Libraries" that listed the key books that college libraries of various sizes should have in their collections. This is now an online resource called "Resources for College Libraries" (behind a paywall) that has over 70,000 titles in 61 different subject areas. What this means is that doing something similar in Wikipedia is neither impossible nor radical.

I got to do a small experiment in this area yesterday at a local "Wikipedia edit-a-thon." I had brought with me some books that I thought would yield interesting explorations - one of which is a marvelous book called "Woman in Science" published in 1913. Although the author is listed as "H. J. Mozans" that turns out to be a pseudonym for John Augustine Zahm. Zahm does have a Wikipedia page, and it did list, within a paragraph, a number of books that he had written. Oddly enough, Woman in Science wasn't one of them. I added it, then decided that since he had written a handful of books that I would add a bibliography on his page. The Wikipedia bibliography format is, well, you know, like so many Wikipedia structures, something less than friendly. But I discovered something that I probably should have known.

I went to the Open Library page for the book, and near the bottom found the list of export formats.

Clicking on "Wikipedia Citation" I got this:

which can be pasted directly into Wikipedia. If you are using it for an inline citation, you need to surround this code with <ref></ref>, which will then create a number reference and will add this to the references at the bottom of the page.

Unfortunately, the Open Library code doesn't include a link to the full text, most likely because that isn't part of the Wikipedia format. To do that I added a link to the Internet Archive digitized version of the book after the citation. You can look on the Zahm page and see how that looks. (I'm still looking for a better way to format it so that the link to the full text stands out without looking ugly.)

There is another way to add bibliographic data to Wikipedia which is to click on the menu at the top of the edit window and select a citation type, which then gives you a form to fill out. But if you can find the item in the Open Library, you can avoid all of that typing.

Now that I have learned that it is easy to add bibliographic data to Wikipedia I'm interested in exploring ways that Wikipedia pages can be starting places for essential reading on topics. It naturally makes sense to point to any existing digital materials, but a next logical step would be to find a way to point to libraries for more recent (and in copyright) materials.

Wednesday, January 16, 2013

Hackers and heroes

Recent events have led me again to a contemplation of the equation of hackers and heroes. How is it that an essentially cerebral and sedentary activity gets equated with heroics? And why computing and not, say, bioscience?

If you've read your obligatory Joseph Campbell you know that the hero myth is ubiquitous in human cultures. Each culture adds its own flavorings and decorations, but the general story is the same: a usually young, alone male goes through transformational trials, performs some task that makes a difference to the world, and is then declared a hero.

In the story-telling world, it ends there. You don't get the post-hero narrative, although, like love stories, there is an implicit "and they lived happily ever after." This makes it easy to forget that in real life "heroism" is a moment, not a lifetime. The fireman who saves the baby from the burning building, the batter who hits the World Series-winning home run: this is a moment of glory before the person goes back to being an ordinary Joe.

When Steven Levy wrote "Hackers, heroes of the computer revolution" in 1984, the hero myth was perhaps new to the computer world. By the early 1990's consumer computing magazines were full of hero and superhero images. This presents an odd contrast to the stereotype of the out-of-shape, asocial, code-writing computer geek. Since then graphics capabilities have allowed the hero myth to move to the screen in the form of first-person games in which anyone with the time and inclination can play at being a hero.

Levy's "hacker heroes" were in fact ordinary computer geeks, and not even the first. Levy focuses on a group of young men at MIT starting in the late 1950's, but they had been preceded in the 1940's and early 1950's by those who were truly the first computer programmers, many of whom were women. There was no attribution of heroics to those pioneers, neither at the time nor retrospectively. What changed?

Many of the studies of the decline of womens enrollment in computer science ask a similar question, which is: how did computer science become a male bastion when it had once seemed welcoming to women? And why did it take on a hyper-masculinized culture, with home brew, skateboarding the hallways, pizza delivered to midnight coding frenzies, and heroes?

I don't have, and have not encountered in my reading, an answer to that question. I do want to caution, however that the hero aspiration has a down side that is played out as tragedy. It might be best to limit our heroes to the mythological realm and leave computing to mortals. It just might become a friendlier place for everybody.

Wednesday, January 02, 2013

OCLC Top 50

OCLC recently released a file of 1.2 million metadata records for the most widely held items in its catalog. These are all items with 250 library holdings or more. I created a list on WorldCat of the top 50, mostly out of curiosity. I was quite surprised at the results, however.

Here's how it breaks down:

16 periodicals, with Time and Newsweek being numbers 1 and 2, respectively
29 kid and YA books, four of which (and very high even in this small list) from the Diary of a Wimpy Kid series
5 adult books

The five adult books are:

McCullough, D. G. (1992). Truman. New York: Simon & Schuster.
Brown, D. (2003). The Da Vinci code: A novel. New York: Doubleday.
Johnson, S. (1998). Who moved my cheese?: An a-mazing way to deal with change in your work and in your life. New York: Putnam.
Haley, A. (1976). Roots. Garden City, N.Y: Doubleday.
Peters, T. J., & Waterman, R. H. (1982). In search of excellence: Lessons from America's best-run companies. New York: Harper & Row

This small set gives me many ideas of things to investigate in the full set. First, the monographs in this set are all recent dates, with the oldest being 1976, and most after 2000.

I am hoping to graph the full set by date. What I expect is that the items will be overwhelmingly recent publications because libraries tend to hold what people read, and my guess is that readers are mainly reading new books. Also, libraries buy from the set of things that are in print, so even if they are buying a so-called classic (as they do every time yet another movie is made of Pride and Prejudice) they are buying a current edition which will have a recent date.

The next obvious bit of information would be correlation between holdings and date, which I expect to be high for the very reasons given above.

The overall distribution of holdings is unsurprising, starting high (at almost 7000 holdings), dropping off dramatically, and creating a long tail. (I had managed to coax a chart of out ooCalc but it crashed before I captured it. Am now studying how to deal with large files and visualization. Advice gladly received.) Of course, the tail would be very, very long if you could chart the entire WorldCat database. (Anyone know how many items in WC are held by only one library? I can't find that in the available WC stats.)

I think it would be interesting to be able to analyze library holdings in correlation with the FRBR-ization that OCLC has done. In fact, I would really like to see the top 1% (or .5%) of FRBR-ized items. Related to FRBR I am mainly wondering if we can estimate how frequently FRBR might fulfill its promise of saving the time of the cataloger. But that's for another day.

Friday, December 07, 2012

Invisible women 2: Cognitive Surplus

I've just read Clay Shirky's 2010 book Cognitive Surplus: creativity and generosity in a connected age. The short summary of the book is: since the 50's people have had more and more leisure time. Until recently that leisure time was taken up with the passive activity of watching television. The Internet has given us the possibility to use our leisure time for social and creative activities, like creating Wikipedia, engaging in online discussion, and even creating lolcats.

Yet I have to ask: how could a smart, well-read professor write an entire book about what people do with their leisure time and not address the well-known and well-documented gender inequality in time available? The OECD did an entire report on what is called "unpaid labor:"

"Most unpaid work is cooking and cleaning – on average 2 hours 8 minutes work per day across the OECD – followed by care for household members at 26 minutes per day. Shopping takes up 23 minutes per day across the OECD on average" Visualized it looks like:

(The left-hand column is minutes, and obviously not all countries are listed. The full data is available as an Excel file.)

The Economist put it more bluntly (and I do think this image is unnecessarily demeaning, but I haven't found one with the same message that is more neutral):

If they had read Shirky's book, this fellow would have been sitting at his computer, updating Wikipedia pages or adding his cognitive surplus to a discussion group on health issues. But he still would have had more leisure time than a woman.

A social world based on "cognitive surplus" will be one that is not gender neutral. It will have more participation by males, and therefore will be socially skewed to the masculine -- at least until we have gender parity in taking care of the home, the children, the elderly, etc. That is something that I would expect an intelligent observer of society to notice. Not only notice, but to ponder: what does this tell us about the nature of the things being created with this cognitive surplus? Does this explain, in whole or in part, the masculine view of "hacking," the participation in Open Source, the gender nature of games and gaming?

** I woke up this morning realizing something that is both not in this book but that I hadn't mentioned: the difference in leisure time and income level. I don't have any figures on that right now, but will do some investigation. My assumption is that leisure is not evenly distributed, and that the working poor have much less leisure time than the middle and upper classes.