Coyle's InFormation: 02/01/2013

Thursday, February 21, 2013

Open to Creativity

The brilliance of Google's PageRank is not the computational methods behind it but the target of those methods: links created by people in the course of making something meaningful on the Web. Without that human input, Google (and Bing and Yahoo) would be simply counting up term frequencies and perhaps analyzing linguistic characteristics, but missing most of what makes Web searching work. The results would be at least as bad as the ranking on Google Books because it would be devoid of the human commitment of significant connections between the pages.

Although Google's mission statement is "... to organize the world's information and make it universally accessible," Google is not really organizing anything. It is reading the organization provided by the Web's population. Similarly, Facebook is reading the relationships between people that are made in the course of using its software, information that it would not have otherwise.

What has made the web the rich environment that it is is that anyone can link anything to anything else. That linking is an expression; and even though we might not be able to characterize it in a few words (what does it mean that page A links to page B?) Google has shown that we can make use of those patterns of linking to help people find stuff on the Web that meets their needs.

In this regard, the Semantic Web has a serious problem. Much of the focus of Semantic Web work is the creation of vocabularies of defined relationships between things, with the intention that these relationships can be traversed and manipulated by algorithms. That is fine in itself, but the Semantic Web enthusiasts are primarily creating pre-determined, fixed relationships between things, mainly based on defining each thing in a class/sub-class relationship with other things. The Semantic Web vocabularies also define requirements called "range" which limit what you can link to from a specific element of your vocabulary. [1]

The tendency to pre-define vocabularies with strict rules is a carry-over from the Artificial Intelligence (AI) environment from which the Semantic Web arose. If you expect to work with machines, and only machines, then you have to define for them exactly what they can and cannot do, and you must present all decisions as formulas that can be calculated.

For reasons that is unclear to me, the Semantic Web work seems to be unaware of the great success that Google has had in using human information sharing activity to creating a meaningful web of links. There is little attention to the fact that establishment of relationships through linking can be done by humans, much less that the best and most useful linking will be done by humans. Human information linking is not definable a priori -- in fact, an a priori definition of allowed links essentially limits the future to re-running past concepts in new variations. It is an absolute barrier to creativity if you can only act on what has been pre-defined.

It's the difference between a prefab house, which can only become what it was designed to become (with some minor modifications that don't change its essence) and a box of blocks that can be used to create anything within the realm of physics (although superglue could extend the possibilities).

In part this is why I tend to speak of "linked data" rather than "Semantic Web." Linked data, as it has evolved as the description of metadata activity, carries less of the AI baggage than Semantic Web does.

To me what will be exciting about linked data is what people will do with it; what they will create, what they will experiment with, and both successes and failures. However, for people to do something with linked data they need tools -- tools that are as easy to use as those used to create Web pages and the links between them.

They also need a box of blocks to work with. These blocks need to be as free of predetermined rules as possible. [2] The terms need to be defined just enough to make them usable. This is also true of the relationships or links. Using our box of blocks metaphor, we want to be able to put the blocks in relationships like "above" "below" "near" each other. If the square blocks are defined as always "below" the rectangular blocks, then that limits what we can create.

The Semantic Web as a machine environment guided with AI formalities appeals to some because it promises to be neat and unambiguous. It will, however, foster only a very constrained amount of creativity, and will not be able to satisfy the full range of human curiosity.

It is a shame that many Semantic Web enthusiasts have little faith (or little interest) in the human potential that linking and openness can unleash. I, for one, am looking for partners in the development of a messy, intelligent, quirky, technology that can produce surprising results, created by people using linked data as a tool of expression. I am particularly exciting by the fact that we don't know what forms that expression will take.

[1] For example, you can define a creator has having to be type "Person" and that Person must be expressed as a URI, like http://kcoyle.net/kcoyle.rdf. In that case, you can't have a creator that is "Karen Coyle" because "Karen Coyle" is just a string of characters, not an identified entity. This means that if you don't have identifiers for your creators, you can't create data about what they created.]

[2] This is referred to as "minimum ontological commitement," introduced in Toward Princiciples for the Design of Ontologies Used for Knowledge Sharing, by T Gruber [http://www2.iath.virginia.edu/time/readings/ontology-semantics-metaphor/designing-ontologies.pdf]

Wednesday, February 06, 2013

Book people v. article people

I am definitely a book person. When I want to learn about something, I want to read hundreds of pages about it. I have a half dozen books on copyright, more than a dozen about the social "questions" around the Internet, a handful on the Semantic Web, two shelves of books on libraries (history, cataloging, theory of knowledge organization), and now four books on cognitive science and the theories around concepts.

I've done work with people who are definitely "article people." Mostly academics, these folks rarely delve into a book since their scholarly conversation takes place in articles published in journals. My guess is that once you reach the level of knowledge that these folks possess, the breadth of a book contains nothing new and all of the interesting stuff comes out in article-sized chunks.

I also like to follow-up on my reading. When a bit of reading focuses around a place, like Bletchley Park or Los Alamos (as recent books on computer history do) I have to find them on a map. Concepts mentioned but not covered in detail require a visit to Wikipedia. I hunt down works cited in particularly intriguing passages. And it is in the midst of this last activity that I run into perhaps a hint about my attraction to books.

Because I am "unaffiliated" with an institute of higher education, it is easier for me to obtain books than to obtain articles. Books are available used or new in an open marketplace, and I find it to be rare that there is reference to a book that I cannot get at what seems to me to be a reasonable rate. But when I look up an article that I might be interested in, I often get something like:

Yes, Wiley asks for $29.95 for an article, and JSTOR asks $38.00. I have seen these prices on articles as short as six pages. These prices are for the download of a PDF file, not an offprint to be delivered by express mail. I can only assume that they have no desire to sell access to individual articles, because the pricing is so out of whack with retail publishing. Remember, these are academic articles that quite frankly haven't a large audience. But they already exist in PDF and are available to members of subscribing institutions. In a world of $.99 pop songs and $9.99 best-selling e-books, these prices are just absurd.

One of the books I am reading at the moment is a compilation of essays called "Concepts: core readings." At least five of the essays were previously published in journals and when I looked them up the download price was $39.95 each. That's about $200 for 100 pages of a 650-page book that retails for $55.

If we want "equal access to information," as we often claim we librarians do, then we need to do something about journal article pricing. I'd be quite willing to pay $2-$4 for an article, but the $30-$40 price range is ridiculous. I'm sure that these journal companies sell very few, if any, full-price articles. As we've seen with other media, when the price is right, it becomes as convenient to pay the price as it is to bother to pirate the materials (which in my case means borrowing someone's academic identity). Surely selling zero articles at $39.95 isn't better than selling a handful of articles at $2 each.

It's great that JSTOR is now offering some articles for free (although I have yet to be able to create an account since their site just hangs when I try), and I wouldn't suggest that JSTOR should be providing an entirely free service, since they have expenses. But $38 for an article is not just too much, it is prohibitive, and it unnecessarily creates an inequality of access. Someone needs to do to the journal publishers what Apple did to the music industry: show them the money.