Sunday, July 04, 2010

Catching up: OCLC, GBS, LOD

Some short comments on recurring themes:

OCLC Record Use Policy

OCLC has finalized its record use policy. The content is substantially the same as it was in the previous draft, which I commented on. There is one important improvement, however: the text clarifies OCLC's claims to copyright.
While, on behalf of its members, OCLC claims copyright rights in WorldCat as a compilation, it does not claim copyright ownership of individual records.
Of course, claiming copyright and actually having the right are not the same thing, especially with databases. Here's what BitLaw says:
Databases as Compilations: Databases are generally protected by copyright law as compilations. Under the Copyright Act, a compilation is defined as a "collection and assembling of preexisting materials or of data that are selected in such a way that the resulting work as a whole constitutes an original work of authorship." 17. U.S.C. § 101.
Generally, carefully selected compilations may make the "original work of authorship" cut; I'm not convinced that a union catalog of library holdings does.

Google Books

We are still waiting to hear from the judge in the Google Books case. (Every time I write that I check to see if it hasn't been released in the last hour.) Meanwhile, GBS continues to function in Internet time. Google has many publishers on board with its partners program, enough that GBS is becoming a serious rival to Amazon. It has even announced that it will begin selling e-books. The opening screen is the exact opposite of the Google Search screen -- it loads up many dozens of book covers and requires significant scrolling to browse to the bottom. Google has added personalization options ("my library") and lets you create multiple "shelves" to organize your materials.

Google was first sued in 2005. Five years is a very long time where technology is concerned. In 2005 the ebook was considered dead; now with the Kindle and the iPad, ebooks are alive and well and everyone is trying to get into that game. In that time since 2005, Google has pretty much shown the publishing industry that they can benefit from the online presence that Google is providing. The settlement reads like it was written in another era, trying to solve problems that may not really be considered problems today. The only issue remaining is that of orphan works, and if we could do a decent analysis of copyright holdings, I suspect that the number of orphan works would not be all that large.

Linked Library Data

At ALA there was a one-day preconference on linked data, and a half day un-conference attended by about 50 people. There are notes from the un-conference, which broke out barcamp-style into 6 groups for discussion.

The World Wide Web consortium has an incubator group on linked library data. This group is tasked to spend one year figuring out how to jump-start the creation of linked data in the library world.

There are ongoing efforts at Library of Congress to produce vocabularies, and of course the RDA vocabularies are available (and almost finalized). Ross Singer has announced some of the MARC codes are available (I presume on his own site). FRBR is being defined in linked data form by IFLA.

We've got just about everything but ... linked data. I'm thrilled that things are moving forward, but frustrated that I still can't see usable results. Deep breath; patience.


Eric said...

Let's grant, for a moment, that OCLC's copyright to the compilation is valid. What is such a copyright useful for? In what situations might the copyright be asserted? The more I think about it, the more I suspect that the copyright claim is completely irrelevant.

Karen Coyle said...

Eric, I basically agree that copyright in WorldCat has little utility. I can only imagine that OCLC could claim that some other party had replicated substantially all of WorldCat, and thus violated their copyright, but I can't see how one would prove such a claim (much less how/why someone would create a database that would trigger such an action). As in many things, I think it's meant to be a warning and a deterrent, rather than being something that OCLC thinks would actually stand up in a court of law. If nothing else, a huge, unselected gathering of library bibliographic data would probably not meet any criteria of "creativity" required by US law on compilations. So I read it as postering -- not unlike many claims of copyrights and a whole bunch of patent claims, as well.

jm said...

Maybe it's just me, but the FRBRer model looks...odd. Creating multiple properties for a single property if that property can have multiple classes in its domain or range doesn't make a lot of sense to me. Do I really need two "is_realized_by" properties, one to use if I'm linking to a person and one to use if I'm linking to a corporate body?

Karen Coyle said...

jm -

I posted a similar "concern" on the RDA list (since RDA does the same thing) and got an answer that has since been retracted. I think the whole issue of what has to be tied to the FRBR entities and what can be used where needed is wide open and not at all well understood. The big issue is: how do we work this out, since the FRBR committees and the RDA committee work behind closed doors.

As far as RDA is concerned, every entity is specifically tied to a single FRBR entity. This has some interesting advantages, because it means that you can take a statement out of context without losing meaning. (I think I need to write something more extensive about this...). So there is a property called "Title of the Work", which means title of the work wherever it appears, and does not depend on being in a structure that looks like:

Title of Work

and is distinguished from "Title Proper" because the latter is solely used for the title of the Manifestation.

The obvious disadvantage is that there are hundreds and hundreds of properties, so interacting with the non-library community will have to use a different vocabulary, since no one else will be wedded to FRBR and this level of detail.

jm said...

I'll have to think more about whether I'm being unreasonable here, but my first reaction is a combination of 'ew' and 'huh?' When would anyone care about a predicate, by itself, out of context? And if it's *in* context (i.e., part of an RDF triple), then you look at the subject and query what class of entity it is and you're done. That's the 'huh?' part. The 'ew' part is using a predicate to imply class membership on the part of a subject. Ugly in the extreme.

Karen Coyle said...

jm -

I totally agree as far as the predicates go. (I was referring to the subjects and objects above.) I see lots of problems with those, not the least of which that it isn't clear to me when you have something like

is subject (manifestation) of

whether manifestation refers to the subject or the object of the predicate. And either way, you've got a redundancy here since the predicate relates a manifestation to ... something. I find the whole thing to be a mess, and would like to see some concerted discussion, since I doubt that it is usable in this form.