Thursday, February 21, 2013

Open to Creativity

The brilliance of Google's PageRank is not the computational methods behind it but the target of those methods: links created by people in the course of making something meaningful on the Web. Without that human input, Google (and Bing and Yahoo) would be simply counting up term frequencies and perhaps analyzing linguistic characteristics, but missing most of what makes Web searching work. The results would be at least as bad as the ranking on Google Books because it would be devoid of the human commitment of significant connections between the pages.

Although Google's mission statement is "... to organize the world's information and make it universally accessible," Google is not really organizing anything. It is reading the organization provided by the Web's population. Similarly, Facebook is reading the relationships between people that are made in the course of using its software, information that it would not have otherwise.

What has made the web the rich environment that it is is that anyone can link anything to anything else. That linking is an expression; and even though we might not be able to characterize it in a few words (what does it mean that page A links to page B?) Google has shown that we can make use of those patterns of linking to help people find stuff on the Web that meets their needs.

In this regard, the Semantic Web has a serious problem. Much of the focus of Semantic Web work is the creation of vocabularies of defined relationships between things, with the intention that these relationships can be traversed and manipulated by algorithms. That is fine in itself, but the Semantic Web enthusiasts are primarily creating pre-determined, fixed relationships between things, mainly based on defining each thing in a class/sub-class relationship with other things. The Semantic Web vocabularies also define requirements called "range" which limit what you can link to from a specific element of your vocabulary. [1]

The tendency to pre-define vocabularies with strict rules is a carry-over from the Artificial Intelligence (AI) environment from which the Semantic Web arose. If you expect to work with machines, and only machines, then you have to define for them exactly what they can and cannot do, and you must present all decisions as formulas that can be calculated.

For reasons that is unclear to me, the Semantic Web work seems to be unaware of the great success that Google has had in using human information sharing activity to creating a meaningful web of links. There is little attention to the fact that establishment of relationships through linking can be done by humans, much less that the best and most useful linking will be done by humans. Human information linking is not definable a priori -- in fact, an a priori definition of allowed links essentially limits the future to re-running past concepts in new variations. It is an absolute barrier to creativity if you can only act on what has been pre-defined.

It's the difference between a prefab house, which can only become what it was designed to become (with some minor modifications that don't change its essence) and a box of blocks that can be used to create anything within the realm of physics (although superglue could extend the possibilities).

In part this is why I tend to speak of "linked data" rather than "Semantic Web." Linked data, as it has evolved as the description of metadata activity, carries less of the AI baggage than Semantic Web does.

To me what will be exciting about linked data is what people will do with it; what they will create, what they will experiment with, and both successes and failures. However, for people to do something with linked data they need tools -- tools that are as easy to use as those used to create Web pages and the links between them.

They also need a box of blocks to work with. These blocks need to be as free of predetermined rules as possible. [2] The terms need to be defined just enough to make them usable. This is also true of the relationships or links. Using our box of blocks metaphor, we want to be able to put the blocks in relationships like "above" "below" "near" each other. If the square blocks are defined as always "below" the rectangular blocks, then that limits what we can create.

The Semantic Web as a machine environment guided with AI formalities appeals to some because it promises to be neat and unambiguous. It will, however, foster only a very constrained amount of creativity, and will not be able to satisfy the full range of human curiosity.

It is a shame that many Semantic Web enthusiasts have little faith (or little interest) in the human potential that linking and openness can unleash. I, for one, am looking for partners in the development of a messy, intelligent, quirky, technology that can produce surprising results, created by people using linked data as a tool of expression. I am particularly exciting by the fact that we don't know what forms that expression will take.

[1] For example, you can define a creator has having to be type "Person" and that Person must be expressed as a URI, like In that case, you can't have a creator that is "Karen Coyle" because "Karen Coyle" is just a string of characters, not an identified entity. This means that if you don't have identifiers for your creators, you can't create data about what they created.]

[2] This is referred to as "minimum ontological commitement," introduced in Toward Princiciples for the Design of Ontologies Used for Knowledge Sharing, by T Gruber []


Ben Companjen said...

Karen, if I may be pedantic... In your note [1] it looks like the URI for you (or a Person with your name) is also the URI of an RDF(/XML, presumably) document at that location. That makes it hard (at best) to distinguish the two.

Karen Coyle said...

Ben, this is the form suggested by "foaf-a-matic"

Ben Companjen said...

The Person described in the file has the same URI, but appended with "#me". So if your FOAF-a-matic FOAF description is at , the Person's URI is .

But enough pedantry. :)

Even though I consider myself a fan of Semantic(s on the) Web, I hadn't really considered there was a difference between Semantic Web and Linked Data.

As you know, I'd like to see Open Library expose data as Linked Data. Linked Data would allow me to say "I have in my collection", and the cataloguing application could look up .
Another application that I'd like to have is an event "agent" that finds interesting events based on speakers/musicians, location etc.

Thad McIlroy said...

Great post. I realize now that my attraction to the Semantic Web has been its presumed orderliness, without realizing that this is what will at the same hobble its effectiveness. Bravo.