Coyle's InFormation: Bib data and the Semantic Web

Monday, May 03, 2010

Bib data and the Semantic Web

I know that I've gone on and on about transforming bibliographic data into a semantic web format. And whenever folks have asked me: "What will it look like?" I haven't had a good response. Now there is something to show you: Freebase.

Freebase is a database of interlinked semantic web "statements": essentially what are called by the SemWeb types as "triples." The statements come from a variety of open data sources such as Wikipedia, TVDB.com, a science fiction fan database, and Open Library. By placing a user interface over these data they now have a searchable, navigable site that can link books to movies to (theoretically) music to science to... well, anything where linked data is available.

Their book data isn't as strong as it should be, given that they claim to have imported the Open Library file (I suspect it was only partially imported). When you look at the Freebase entry for Emily Dickinson you only see two works listed. Open library has 137 Works for Dickinson, and WorldCat Identities lists 3, 388. Also, their approach is more "popular" than rigorous. However, there is no reason why this same technique could not be used with "pure" library data, and library catalogs could make use of any of the data in such a database because it is all available through linking and APIs. A database like Freebase essentially serves as a huge pot of available, re-usable information.

In its current form, Freebase would not be sufficient for library data sharing, although it could provide an interesting testing ground. What we need to work out for libraries is a way to version and source content so that you know who provided each statement and when, and to make it easy to contribute new information or improvements to the information in a sensible and automated way. There is no reason why we could not create a "LibBase" that exists solely of what libraries would consider to be authoritative information; a kind of linked data WorldCat. That data would have to be able to interact with other data on the Web, and by doing so libraries would become discoverable on the Web. It would be logical for projects like Freebase to link to the library data. Library users would have a rich, navigable information base that could help them follow (or even make) connections between library resources -- connections that are much less evident in today's catalogs. Some technical magic would need to occur to allow users to move seamlessly from the whole world to their local library, but I don't think that's going to take rocket science to solve.

There is a group of interested souls planning to get together on the Friday morning of ALA DC to begin some exploration of how we might make semantic web technology work for libraries. There will be announcements on various lists (I'm guessing NGC4LIB, CODE4LIB, LITA-L and RDA-L, a the very least). If you can get to ALA a little early, please mark that slot on your calendar. It'll be a free-floating, working, barcamp-style meeting, as I understand it.

3 comments:

Unknown said...: I'd love to come, but Friday morning puts your 'meeting' up against the RDA 101 preconference, which I am paying to attend. Drat.; 5/04/2010 10:41 AM
Karen Coyle said...: We're thinking of moving it, if we can find another space, but as you know, there really are no true "no conflict" times at ALA. Every slot is up against something important. *sigh*; 5/04/2010 2:33 PM
paul said...: hmm

i am guessing you mean thetvdb.com
not tvdb.com

http://thetvdb.com/

It does worry me that there is such an obsession in the "library world" with books

And to say "not be sufficient for library data sharing" dont you mean to day - how can we enrich our data with this data and how can we help their data get better

Maybe take a look at thetvdb.com and reflect on what the book obsessed library community can do to make this better before it become a monetarised beast like IMDB - which grew from free data on USENET - but now is owned by Amazon...

you mention two rather arrogant concepts - the notion of "authoritative information" and WorldCat. We (library) need to get past both these stumbling blocks

There is no center - the Open Library is more significant than OCLC but only list books. Even OCLC has Long Players among its mush...; 7/01/2010 5:17 AM