Comments on Coyle's InFormation: FRBR and Sharability

Hi Karen - Thanks for the thoughtful reply and th...

2010-06-01T17:43:17.755-07:00

Hi Karen -

Thanks for the thoughtful reply and the info about cataloging serving as a surrogate. It is strange that FRBR uses the term realized by for what I would call published by, but I suppose they had their reasons.

-Erik

Erik, your statement about the transcribed element...

2010-06-01T08:30:40.234-07:00

Erik, your statement about the transcribed elements is true: there is a bunch of stuff in a library record that attempts to transcribe faithfully what is found on the item. That reflects the role of the bib record as a surrogate for the actual item. Those things won't/can't be controlled as vocabularies. (Although I think it may be time to rethink which fields this should apply to, and how useful it is.) The "publisher name", which is an element in RDA, was not an actual transcription in the previous cataloging rules, but was allowed to be abbreviated based on the cataloger's judgment. In RDA, I believe it is transcribed. So in this sense you are right -- we should separate the transcribed text from ACTUAL DATA, and we could then include both. (My contention is that there are only a handful of fields that need to be transcribed.)

As for your:

W dc:creator P
P rdf:type frbr:Person

It is all a matter of precision. dc:creator can take any values, so this statement could be used in a situation where some P are frbr:Person and some are not. In that case, each instance would define itself through its value, and they could intermingle in a data set. For folks who are wanting to create FRBR-defined data only, they would want to be more precise and state that their property (not its value, as in your case) is in FRBR:creator, which can only take FRBR:Person as a value. In fact, the library world, through FRBR and RDA, is going for maximum precision, something that I believe is going to inhibit data exchange and keep library data in its silo. But that's the data I'm trying to work with at the moment.

Now that I think about it, your dc:creator example could help us out: there are non-frbr-ized properties defined in the RDA vocabularies at http://metadataregistry.org/rdabrowse.htm, and we could use those with frbr defined G2 and G3, which most folks don't have a big problem with (although we do need a definition of each of those outside of FRBR as well...)

ok, gotta think some more. Thanks.

Hello again Karen, sorry I've missed a few day...

2010-06-01T01:34:59.973-07:00

Hello again Karen, sorry I've missed a few days of this. When I read your reply to my initial comment I realized that I was indeed wrong to say that FRBR had to be creating a relational database; as Jonathan implied, I am used to relational databases and I do see things through that lens by default! I agree entirely, however, that the semantic web is the more appropriate model here.

So what happens in the semantic web model when you borrow a work record from someone without borrowing the author record, for example?

The answer - nothing. The work record on your catalogue will continue to point to the author record on theirs. And that is exactly how the semantic web is supposed to work. Indeed, the people you're borrowing from will probably have referenced an author record in a repository like LC Authorities. And in time, there may be such central repositories for work, manifestation, expression records as well - all your own catalogue will have to do is highlight the relevant bits in such publicly available resources, and add whatever you want to for the benefit of your particular user group.

Tom

Hi Karen - I think I'm missing something about...

2010-05-30T08:30:55.291-07:00

Hi Karen - I think I'm missing something about the creator. I see what you are saying about frbr:creator being a subclass of dc:creator. But I'm not sure what the problem is with modeling the relationship as:

W dc:creator P
P rdf:type frbr:Person

What I meant by bringing up the statement of responsibility is that the publisher attribute is to the realized by relationship as the statement of responsibility is to the created by relationship.

That is, FRBR provides for a controlled vocabulary/entity model for publishers in the realized by relationship, while also providing a free-form, uncontrolled string in the publisher attribute.

(I always imagined the statement of responsibility as a mostly faithful transcription of what a book says its author(s) is/are. Maybe this is wrong.)

So an expression might have the following triples:

E dc:publisher "A joint publication of the State Department of the United States and the British Foreign Office."
E frbr:realizer http://state.gov/
E frbr:realizer http://www.fco.gov.uk/

(As an aside, I think that the distinction between a controlled form and an uncontrolled one is a very useful one, and librarians ought to promote it to the RDF modelers more often. A fun excursion on this theme can be found in this blog post about Franco, master of soukous and rumba, a/k/a Franco et le O.K. Jazz, Franco et Le Tout Puissant OK Jazz, Franco et le TPOK Jazz, etc. A person should not have to choose between ensuring that an album is tied to the controlled form of its creator and transcribing faithfully the creator named on the work, which I think contains information that can be quite useful to the person looking for an item, such as that Franco has been promoted to Le Grand Maitre Franco.)

Erik, FRBR created by would be a narrowing of dc:c...

2010-05-30T06:46:21.044-07:00

Erik, FRBR created by would be a narrowing of dc:creator, because it is required to have as its object (or subject, I'm not sure which way it intends to point) a FRBR Group 2 entity. dc:creator can take anything, so it is broader in its definition.

Statement of responsibility is an invention of library cataloging and has nothing to do with publishers. When you have a title that looks like:

Alice in Wonderland : Alice’s adventures in Wonderland and Through the looking-glass / by Lewis Carroll ; black and white illustrations by John Tenniel.

everything after the "/" is the statement of responsibility. The problems with publisher have to do with how it has been treated in library cataloging in the past, not any theoretical consideration.

Hi Karen - FRBR has the "created by" re...

2010-05-29T20:09:06.223-07:00

Hi Karen -

FRBR has the "created by" relationship defined in 5.2.2Hi Karen -

FRBR doesn't define an attribute for creator, but it does define the "created by" relationship in §5.2.2, or in the FRBR RDF schema, frbr:creator. I think that it is functionally equivalent to dc:creator.

Re. publisher, after a quick look back at the FRBR document I would think that publisher (the attribute) is equivalent to the statement of responsibility, while the produced by relationship is like the created by relationship.

-Erik

Erik, That's the general idea, although creat...

2010-05-29T17:05:42.195-07:00

Erik,

That's the general idea, although creator is not an attribute of frbr:work. So instead of:

#
:w dct:creator :pratchett.
#
:pratchett a frbr:Person .

you need something more like:

:p pratchett
:r is creator of
:w

The creator is not contained at all within WEMI, but has a relationship to W. The scenarios on the DC-RDA page also put the creator within a W structure, but I think that is not what FRBR says. It may turn out that's the best way to do it, but I'm trying to puzzle through FRBR as it is before declaring it to be unworkable.

Also note that this is an area where FRBR is inconsistent, in my mind. The manifestation has "publisher" as an attribute, even though the diagrams show "publish" as a relationship between an agent entity and a manifestation (G2 and G1). But there isn't a creator attribute on work, which would also be a G2 and G1. I think this is because they were thinking in terms of traditional cataloging, and publishers are not treated as entities (e.g. they are not authority controlled) in traditional cataloging. In fact, publisher should be expressed as a relationship. Inconsistencies like these make it hard to model the data that catalogers will produce under FRBR.

Hi Karen - I am coming at this as somebody who do...

2010-05-29T15:50:43.021-07:00

Hi Karen -

I am coming at this as somebody who does not know RDA at all, & who has never cataloged a book (in a MARC system, at least). So take what I say with a grain of salt.

My vision of what a FRBR/linked data (FRBR/LD) cataloging system is informed a bit by a project I did for building a digital collection web site. (https://launchpad.net/ervin).

It seems to me that FRBR/LD cataloger is weaving a web, pulling together existing pieces at whatever level is necessary to describe the Item in hand. A Work from here, a Topic from there, a Person from somewhere else. In this way they should be able to do the minimum level of work possible. If a suitable Expression has been cataloged by somebody else, this might mean no cataloging proper. Or it could involve creating a record for each W, E, M, and I, but pulling in subjects from id.loc.gov and authors from openlibrary. Or in the case of very strange works, creating every single piece.

Doing a good job of this will involve good UIs for searching the pool of triples out there, as you say. There needs to be a good way of finding a Person URL that describes who you need.

While I agree with you that a Manifestation record proper does not contain author info, it is connected to author information. So while the triple does exist, we can create a serialization of a graph that contains manifestation information and the author, while clearly differentiating the two. E.g., in turtle syntax, please see here.

Thanks for bringing this up. There is a lot to figure out here.

-Erik

Erik, My question has to do with how we will shar...

2010-05-29T13:22:55.419-07:00

Erik,

My question has to do with how we will share cataloging copy. Maybe I didn't make that clear enough. If we have a bunch of linked data *somewhere*, and someone dips into it while cataloging, we will need to present the cataloger (I think, I could be wrong) with a sensible unit out of the giant pool of triples (or whatever it is that we have). FRBR is often cited as aiding the sharing of cataloging data. There are some examples of cataloging scenarios linked from http://dublincore.org/dcmirdataskgroup/. But those scenarios assume that there are things called 'work, expression, manifestation records' -- but I don't think we'll have records for WEMI entities because that doesn't make sense to me. (This all started when someone took a typical bib record and labeled it a FRBR:Manifestation, which of course it isn't if it has authors and subjects.) So I'm thinking about what makes sense for sharing cataloging, and it looks like I'm not thinking very clearly because I have not conveyed my thoughts very well.

Karen - I responded because I read your blog. No c...

2010-05-29T12:55:04.554-07:00

Karen - I responded because I read your blog. No conspiracy of linked-data folks. :)

I guess I don't understand the question, "what minimum elements will be needed for sharing".

My answer to that question, as I thought I understood it, was: one URL. If I have an item, and you have cataloged its manifestation, then all I need is the URL you used, right? I simply create a new Item in my system, give it the relationship . Then my access system follows URLs until it has added to my access system all information necessary for a user to search for [Pratchett Mort] or [Fantasy fiction] or [Littérature fantastique] and locate that item.

There is a lot of hand waving, I admit; a lot to be answered in the "follow URLs until it has added to my access system all information necessary" part. I guess that is probably what you are asking.

Thanks for your time.

-Erik

response to egh - hmmm. I wonder why I keep getti...

2010-05-29T12:36:00.109-07:00

response to egh -

hmmm. I wonder why I keep getting comments about IDs and linked data on this post.

Yes, I am assuming linked data. I am assuming URIs. That doesn't change the question of what minimum elements will be needed for sharing. THAT is the topic of the post.

Anyway, as I continue to think about it, it seems to me that we'll need all of the linked elements between G1 and G2 and G3, but probably NOT the G1/G1, G2/G2 or G3/G3.

I suspect that the appropriate model for sharing i...

2010-05-29T11:23:12.285-07:00

I suspect that the appropriate model for sharing in FRBR is the linked data model.

Let me explain what I mean. Let us assume that I am cataloging the 2001 edition of your example. I know that you have done the work of creating a Work and Expression entry for it. Because we are in a linked data world, you have given them URLs, W:identifier and E:identifier.

Now I create a record:

manifestationOf: E:identifier
Place of publication: New York, NY
Publisher's name: New American Library
Extent of text: 181 pages
Dimensions: 21 cm
[...]

When this record is indexed for access, any needed information is pulled down from your URL, E:identifier, and added to our index. (Assume that this data can be cached to avoid a heavy hit on servers your.) Because you have exposed the data in a URL, a machine can access it. So my access system would pull down your Expression info and "follow its nose" to the Work, using any and all information it wanted to build an access system that is useful my users. Becuase your links are other URLs, my indexer can fetch them if necessary. For example, your Work record would contain links to a Person record for Terry Pratchett, where I can pull down dates and pen-names. Additionally it could contain links to a Topic record for "Fantasy fiction" on at http://id.loc.gov/authorities/sh85047114. We could pull in some broader terms, e.g. Fantasy literature, so that a user searching for fantasy literature would not be disappointed. We could also follow "Similar concepts from other vocabularies" to pull a similar french topic, Littérature fantastique.

I hope this argument makes sense. You can probably tell that I am a fan of FRBR and linked data. I think that the unit of sharing in a FRBR/linked data catalog is as large or small as the access system needs. If the access system is in Canada, it would probably want to pull in that French language topic heading for the book. If it is in the US, maybe not. The key is dereferenceable URLs and machine-readable data.

Jonathan, perhaps I wasn't clear. I'm not ...

2010-05-27T19:27:09.491-07:00

Jonathan, perhaps I wasn't clear. I'm not objecting to anything, I'm trying to clarify what are the most useful units for sharing. And it seems to me that many folks haven't understood that the sharable unit is not the FRBR Work entity, but that entity with the creator and subjects that link to it. This is purely practical in terms of what would be useful in a shared environment.

You say "And I still disagree that WEMI doesn't fit into an RDF/entity-value-attribute/semantic-web model. It certainly does/can. And even in that model, you still want Person and Work to be separate entities linked by a relationship" -- Person and Work are not WEMI; WEMI is Group1, and I think Group1 has problems because the entities do not stand alone. I don't know how you get from this to string literals, so I think we're talking past each other. In this particular post I'm not talking about IDs (notice I left them out of my description) vs. literals. I'm talking instead about what data groups make sense in a shared bibliographic environment that is FRBR-based rather than MARC based. In MARC we use the whole MARC record as our share point. With FRBR we can share at different levels, and I'm trying to think through what "sets" make sense for sharing. That's all.

And I still disagree that WEMI doesn't fit int...

2010-05-27T16:11:49.112-07:00

And I still disagree that WEMI doesn't fit into an RDF/entity-value-attribute/semantic-web model. It certainly does/can. And even in that model, you still want Person and Work to be separate entities linked by a relationship -- the way computers link things through relationships is by "identifiers", of which URIs, database foreign keys, and LCCN authority numbers, are all examples of.

Even in, especially in, RDF-style, you don't want simple repeated string literals for the same name all over the place, you want relationships between entities.

The author is indeed instrinsically attached to th...

2010-05-27T16:09:47.793-07:00

The author is indeed instrinsically attached to the work -- but it is quite PROPERLY modelled as a relationship to an author entity, not as embedded author literals in the work record. Isn't this the kind of thing you are always arguing FOR, Karen? Using relationships (whether expressed as URI or otherwise) instead of copy-and-pasted string literals?

This is true whether it's an "entity relational" model, a "relational database", "RDF-style", "entity-value-attribute", etc. It's just good modelling. For a person to be a seperate entity than a work, not just a bunch of literals attached to a work. This is NOT something unique to relational databases, although those used to relational databases can mostly easily understand it through that lense. It's just plain good modelling.

But maybe I'm misunderstanding your objection, because I'm surprised to see you disagreeing with this, I've seen you so often before arguing for it?

Whether a particular system includes the "literals" corresponding to a Person in a 'record export' of a Work that person is the creator of -- well, a system is certainly free to do that, but I'd hope it ALSO includes the "identifier" (whether URI, LCCN, or other) of the Person entity in relationship, so the consuming system can restore properly modelled data, not just string literals.

Tom, I'm afraid I don't go for the "r...

2010-05-27T10:08:06.303-07:00

Tom, I'm afraid I don't go for the "relational database" model. I'd rather have the entities and links be of the semantic web type, and therefore to be able to be used independently from a particular file, record, or database design. Because WEMI is hierarchical, it doesn't fit into this kind of a model; the dependencies are the barrier to that. So I think we'll need a "son of FRBR" to move into the semantic web world.

Carol, I think the "superwork" is one that encompasses everything related to the Work, and would bring together books, movies, operas, etc., which libraries consider to be separate works. The Bib/HOldings/Item is more like WEMI. But I agree with your take on the data and parsing, once we get the model part worked out.

Perhaps I'm totally off-base (it happens) but ...

2010-05-27T05:26:32.043-07:00

Perhaps I'm totally off-base (it happens) but it seems to me this is what occurs in the "super" record that John Espy from VTLS was discussing at ALA 2008. At least, this was my understanding of the whole thing. From your scenario, it is similar to the current bib/holding/item record - the holding record has very little info without the bib record and the item record works only with the holding record then the bib record. If the information in your scenario is parsed out into easily manipulated fields (such as a numeric field for the pagination in manifestation, a numeric field for date, etc. etc.), this should link easily out and if the data in the work/expression/manifestation is linked (are linked?) then I see lots of options. Or am I again walking down the wrong garden path?

The dependencies you're talking about aren'...

2010-05-27T03:56:44.652-07:00

The dependencies you're talking about aren't really something to be worried about - they're part and parcel of how a relational database (which is what we're actually creating with FRBR) works.

Essentially, there are two more types of "hidden" attribute for these entities beyond the ones you've described. What you've talked about are the actual metadata elements, as it were - they're what users of the system will view and be interested in. But to provide the links or dependencies that you have described, the entities will also contain attributes called "primary keys" and "foreign keys" in database-speak.

Each entity should have one primary key, which is simply a unique identifier differentiating this entity from all others of the same type. So, for example, Terry Pratchett could have the primary key 123456 - or it could be a text string such as is currently used in authority records including his name in a decided format and possibly discretionary information to distinguish him from other Terry Pratchetts. Either approach will do, although the former is probably to be preferred.

Then entities - such as works - which "have" an author need a "foreign key" as one of their attributes, which is precisely another element's primary key. So your two entities from your first example become in fact:

Work:
Primary key: 987654
Title: Mort
Foreign key for author: 123456

Person:
Primary key: 123456
Name: Terry Pratchett

These are the "bits" that will be shared under FRBR - there need be no mystery surrounding the link between them because it is in fact explicit in their attributes. It's just that users probably aren't interested in primary and foreign keys, and they probably won't be displayed by default in (e.g.) OPACs.

Your point remains ultimately valid though. As I'm sure you'll see immediately, everybody must be using the same primary keys for their entities, or the system will break as soon as records are shared. Centralized authority control is thus absolutely essential for the benefits of FRBR to be reaped.

Tom