<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-3338174527262061848</id><updated>2012-01-18T19:21:09.954-08:00</updated><category term='linux'/><category term='reading'/><category term='names'/><category term='googlebooks'/><category term='OpenLibrary'/><category term='ebooks'/><category term='RDF'/><category term='library catalogs'/><category term='DCMI'/><category term='RDA'/><category term='Standards'/><category term='books'/><category term='women technology'/><category term='semantic web'/><category term='linked data'/><category term='Digital libraries'/><category term='oclc'/><category term='open data'/><category term='privacy'/><category term='authority control'/><category term='digitization'/><category term='FOAF'/><category term='MARC'/><category term='identifiers'/><category term='classification'/><category term='copyright'/><category term='cataloging'/><category term='application profiles'/><category term='lcsh intell'/><category term='wish list'/><category term='vocabularies'/><category term='RDA DCMI'/><category term='internet'/><category term='skyriver'/><category term='DRM'/><category term='search'/><category term='RFID'/><category term='open access'/><category term='kosovo'/><category term='intellectual freedom'/><category term='metadata'/><category term='classification LCSH'/><category term='FRBR'/><category term='ER models'/><title type='text'>Coyle's InFormation</title><subtitle type='html'>Comments on the digital age, which, as we all know, is 42.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default?start-index=101&amp;max-results=100'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>238</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-5459795530034561560</id><published>2012-01-17T09:59:00.000-08:00</published><updated>2012-01-17T09:59:40.908-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='privacy'/><title type='text'>Google dashboard</title><content type='html'>Google has an ad in today's New York Times. Over a half page (and with lots of white space), it is a cartoon of a guy up to his waist in water calling a plumber. The plumber who answers says: "I'm on my way. See you in 15 hours." The rest of the text goes:&lt;br /&gt;&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;"You live in Peoria. Do you really need a plumber from New York? We didn't think so.... That's why search engines, including Google, give you results based on your city or region. They can do this by using your computer's IP address. It's a number like 209.85.229.147, which acts like a zip code to tell them the rough area your computer is in.&lt;br /&gt;&lt;br /&gt;To find out more about how websites get to know you better go to google.com/goodtoknow"&lt;/blockquote&gt;The text vs. subtext in this ad is stunning. Although justifying a Google practice, it speaks of it in the third person: "they" use your IP address, it tells "them" the area your computer is in. The message is: everyone does it. It's not a Google thing, it's an Internet thing. Don't blame us.&lt;br /&gt;&lt;br /&gt;The site at "goodtoknow" uses the same cartoon figures and has very little text; most information is given via videos. The site is a fairly good round-up of information topics, from phishing to securing your home wifi network. (The irony of that being that Google was caught &lt;a href="http://googleblog.blogspot.com/2010/05/wifi-data-collection-update.html"&gt;picking up open wifi traffic &lt;/a&gt;in Germany.) I could imagine it as a "go-to" place for novices needing information on online privacy. Much of it isn't about Google at all: the video on "Stay safe online" gives five rules about passwords and avoiding phishing and never mentions Google. It also doesn't mention that when you log into a site with a secure password, everything you do is observable by the owner of the site. Believe it or not, many people do not understand that. They think that the password makes their activities private, even to the site owner. &lt;br /&gt;&lt;br /&gt;The page on "Manage your information" includes a link to &lt;a href="http://google.com/dashboard/"&gt;Google Dashboard&lt;/a&gt;, which was also mentioned in one of the videos, and which, if I'd known about I had forgotten. Google Dashboard is a list of some of the things that Google knows about you, in particular which Google services you have accounts on. It shows your settings on these services. I found some services I had played with and forgotten about, which I can now delete.&lt;br /&gt;&lt;br /&gt;Of course, Dashboard is only the tip of the iceberg in terms of what Google knows about us. I turned off Web history in 2007 so I don't see my searches there. If you are at all concerned about privacy, visit Dashboard and make some adjustments. Google warns you that you will get results that are less customized for your interests. However, if you are reading this you probably are an information professional, and my guess is that you can find the ad for that printer just as well searching privately (if real privacy really exists) without also letting Google know your political, sexual and religious interests.&lt;br /&gt;&lt;br /&gt;You often hear that people don't really care about their privacy and they are quite happy to give Web sites their information in exchange for services. I also observe that behavior, but I'm not convinced that the majority of Web users are truly aware of how much information about them is being gathered. I also doubt that most users know how to take advantage of things like the private browsing options in browsers. (I'm not sure I trust that private browsing is truly private. I also don't know how to find out how private it really is.) I do find myself giving out information about myself to Web sites, but it's not because I don't care: it's because I get rushed and don't want to take the extra step, or I forget, or I'm not given a choice and I need to access that site right now. I don't believe in blaming users for the lack of privacy, because the privacy options are always opt-out, not opt-in, and are often hard to find.&lt;br /&gt;&lt;br /&gt;And, yes, I know I am writing this on a Google-owned blog site. I've had on my task list for a very, very long time to figure out a way to port this content over to my own web site. It's not so much for privacy purposes (it'll still be a public blog) but because I want the content to be mine even though I'm more likely to lose it than Google is.&amp;nbsp; The Web has become my workplace and the choice I make is not privacy vs. better ads but privacy vs. getting my work done.&amp;nbsp; Making it all about advertising trivializes the reality that our personal and professional lives are intertwined with systems we have no control over. This dependency is as frightening as the privacy issue.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-5459795530034561560?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/5459795530034561560/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=5459795530034561560' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5459795530034561560'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5459795530034561560'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2012/01/google-dashboard.html' title='Google dashboard'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-1264612318889493046</id><published>2012-01-11T08:38:00.000-08:00</published><updated>2012-01-11T08:38:38.156-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='FRBR'/><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><category scheme='http://www.blogger.com/atom/ns#' term='linked data'/><category scheme='http://www.blogger.com/atom/ns#' term='DCMI'/><category scheme='http://www.blogger.com/atom/ns#' term='RDA DCMI'/><category scheme='http://www.blogger.com/atom/ns#' term='RDF'/><category scheme='http://www.blogger.com/atom/ns#' term='cataloging'/><title type='text'>Bibliographic Framework: RDF and Linked Data</title><content type='html'>With the newly developed enthusiasm for RDF as the basis for library bibliographic data we are seeing a number of efforts to transform library data into this modern, web-friendly format. This is a positive development in many ways, but we need to be careful to make this transition cleanly without bringing along baggage from our past.&lt;br /&gt;&lt;br /&gt;Recent efforts have focused on translating library record formats into RDF with the result that we now have:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ISBD in RDF&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FRBR in RDF&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; RDA in RDF&lt;br /&gt;&lt;br /&gt;and will soon have&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MODS in RDF&lt;br /&gt;&lt;br /&gt;In addition there are various applications that convert MARC21 to RDF, although none is "official." That is, none has been endorsed by an appropriate standards body.&lt;br /&gt;&lt;br /&gt;Each of these efforts takes a single library standard and, using RDF as its underlying technology, creates a full metadata schema that defines each element of the standard in RDF. The result is that we now have a series of RDF silos, each defining data elements as if they belong uniquely to that standard. We have, for example, at least four different declarations of "place of publication": in ISBD, RDA, FRBR and MODS, each with its own URI. There are some differences between them (e.g. RDA separates place of publication, manufacture, production while ISBD does not) but clearly they should descend from a common ancestor:&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;RDA: &lt;a href="http://rdvocab.info/Elements/placeOfPublicationManifestation"&gt;place of publication&lt;/a&gt; &lt;br /&gt;RDA: p&lt;a href="http://rdvocab.info/Elements/placeOfDistributionManifestation"&gt;lace of distribution &lt;/a&gt;&lt;br /&gt;RDA: &lt;a href="http://rdvocab.info/Elements/placeOfManufactureManifestation"&gt;place of manufacture&lt;/a&gt; &lt;br /&gt;FRBRer: &lt;a href="http://iflastandards.info/ns/fr/frbr/frbrer/P3057"&gt;has place of publication or distribution &lt;/a&gt;&lt;br /&gt;ISBD: &lt;a href="http://iflastandards.info/ns/isbd/elements/P1016"&gt;has place of publication, production, distribution&lt;/a&gt;&lt;/blockquote&gt;This would be annoying, but not unworkable, if these different instances of "place of publication" could be treated as having some meaning in common such that one could link a FRBRer element to an ISBD element, but they cannot. The reason they cannot is that each of these constrains the elements in a particular way that defines its relationship to a single data context (what we generally think of as a "record structure"). The elements are not independent of that context, and this means that each can only be used within that particular context. This is the antithesis of the linked data concept, where data sets from diverse sources share metadata elements. It is this re-use of elements that creates the "link" in linked data. To achieve this, metadata elements need to be unconstrained by a particular context. &lt;br /&gt;&lt;br /&gt;Linking can also be achieved through vertical relationships, similar to "broader" and "narrower" in thesauri. This is less direct, but makes it possible to mix data sets that have differing levels of granularity. In our case, the ISBD "place of publication, production, distribution" could be defined as broader to the three RDA elements that treat those separately. Unfortunately that is not possible because of the way that ISBD and RDA have been defined in RDF. (I'll post more detail about this later for those who want more.)&lt;br /&gt;&lt;br /&gt;The result is that we now have a series of RDF silos, expressions of our data in RDF that lack the linking capabilities of linked data because they are bound to specific data structures. Clearly we gain little in terms of linked data by creating mutually incompatible bibliographic views. Not only are these RDF schemes not compatible with each other, none will be linkable to bibliographic data from communities outside of libraries who published their data on the Web. That means no linking to Amazon, to Wikipedia, to citations within documents. &lt;br /&gt;&lt;br /&gt;Given where we are in the development of linked data for libraries, we now have two options:&lt;br /&gt;&lt;br /&gt;1) Define 'super-elements' that float above the record formats and that are not bound by the constraints of the RDF-defined records. In this case there would be a general "place of publication" that is super- to all of the "place of publication" elements in the various records, and would be subordinate to a general concept of "place" that is widely used (possibly a property of &lt;a href="http://www.geonames.org/"&gt;GeoNames&lt;/a&gt;). To implement linking, each record element would be extrapolated to its super elements.&lt;br /&gt;&lt;br /&gt;2) Define our data elements outside of any particular record format first, then use these in the record schemas. In this case there would be only one instance of "place of publication" and it would be used throughout the various bibliographic records whenever an element with that meaning is needed. Those records would be interchangeable as linked data using their component data elements, and would interact with other bibliographic data on the Web using the RDF-defined elements and their relationships.&lt;br /&gt;&lt;br /&gt;My message here is that we need to be creating data, not records, and that we need to create the data first, then build records with it for those applications where records are needed. Those records will operate internally to library systems, while the data has the potential to make connections in linked data space. I would also suggest that we cease creating silo'd RDF record formats, as these will not move us forward. Instead, we should concentrate on discovering and defining the elements of our data, and begin looking outward at all of the data we want to link to in the vast information universe.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;_____&lt;br /&gt;* Note on RDA: RDA in RDF includes two "versions" of each data element: one bound to FRBR and one not. The latter has potential for re-use outside of a FRBR environment, and was designed for this purpose by the DCMI/RDA task force. Its relationship to "official" RDA is somewhat unclear at this time but hopefully will gain support as the linked data concept is absorbed into the bibliographic framework.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-1264612318889493046?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/1264612318889493046/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=1264612318889493046' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1264612318889493046'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1264612318889493046'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2012/01/bibliographic-framework-rdf-and-linked.html' title='Bibliographic Framework: RDF and Linked Data'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-897720017343221499</id><published>2012-01-02T08:28:00.000-08:00</published><updated>2012-01-02T08:28:39.100-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><title type='text'>Google Book Search Redux</title><content type='html'>The document I referred to in the &lt;a href="http://kcoyle.blogspot.com/2011/12/google-files-motion-to-dismiss.html"&gt;previous post&lt;/a&gt; would have been so much clearer if I had read the two preceding documents. Now that I have, the story is even more dramatic.&lt;br /&gt;&lt;br /&gt;On December 12, 2011, the Author's Guild filed a &lt;a href="http://thepublicindex.org/docs/complaint/fourth_amended.pdf"&gt;fourth amended complaint&lt;/a&gt; (PDF) against Google. This complaint is nearly identical to the first one, filed on &lt;a href="http://thepublicindex.org/docs/complaint/authors.pdf"&gt;September 20, 2005&lt;/a&gt; (PDF). The two complaints between these (&lt;a href="http://thepublicindex.org/docs/complaint/second_amended.pdf"&gt;October 28, 2008&lt;/a&gt; and &lt;a href="http://thepublicindex.org/docs/complaint/third_amended.pdf"&gt;November 16, 2009&lt;/a&gt;) included the Association of American Publishers, as did the two attempts at settling the case. (&lt;a href="http://thepublicindex.org/documents/settlement"&gt;October 28, 2008&lt;/a&gt;, and &lt;a href="http://thepublicindex.org/documents/amended_settlement"&gt;November 13, 2009&lt;/a&gt;). The publishers had had their &lt;a href="http://thepublicindex.org/docs/complaint/publishers.pdf"&gt;own complaint&lt;/a&gt; in 2005 before combining forces with the Authors Guild. Now the Authors Guild is again standing alone against Google's book digitizing efforts.&lt;br /&gt;&lt;br /&gt;This fourth amended complaint brings us pretty much back to square one, with the addition of the involvement of more libraries and the creation of HathiTrust as a way for the libraries to store their (allegedly) ill-gotten copies. The library copies are a key element of the suit because they are proof that Google has not only digitized the library books but has made copies (the purview of copyright law) and distributed them to others. &lt;br /&gt;&lt;br /&gt;The most interesting document of this latest group, and the one with the greatest detail about Google's actions, is the &lt;a href="http://www.ifrro.org/sites/default/files/990-memorandum-in-support.pdf"&gt;Memorandum&lt;/a&gt; in support of the class certification. This document is the explanation of why the Authors Guild should be considered by the court to be a valid representative of all authors in a class action suit. The document has a number of quotable moments, of which my favorite is the "tell it like it is in plain language" opening:&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;This litigation arises from Google's business decision to gain a competitive edge over its rivals in the search engine market by making digital copies of millions of "offline" printed materials. ... Rather than obtaining licenses from copyright owners for the digital use of their printed works, Google instead entered into agreements with libraries to gain access to these works. A number of university libraries allowed Google to make digital copies of the books in the libraries' collections, including in-copyright books. In exchange, Google provided digital copies of the books to the libraries. Google refers to this massive copyright infringement as its "Library Project." (p.1)&lt;/blockquote&gt;The assumption on the part of most folks commenting on this latest development in this now 6-year-old case is that the settlement is dead. We are therefore back to the question of whether Google's book scanning is or is not Fair Use. This question, though, is only being asked on the part of authors, not publishers, and if anyone has inside knowledge on what approach the publishers are taking I would love to hear it. It is clear that the position of publishers in relation to Google has changed greatly over these past 5-6 years since the suit was originally filed. There are now reportedly thousands of publishers who are using Google Books to promote and sell their works. It also makes sense that publishers, as corporations, are better able to negotiate with Google than are individual authors. A large publisher with numerous books in print and in its backlist has clout that a single person does not have. In addition, large publishers have lawyers, or access to legal counsel. At least some publishers have made their peace with Google and are seeing the relationship as advantageous.&lt;br /&gt;&lt;br /&gt;Looking at this from the library point of view I wonder what will happen to the millions of library books already scanned by Google. I also wonder what this awkward and failed attempt to create the overly broad settlement between Google and the AG/AAP will mean for future digitization projects. There are strong arguments for digitization for scholarly purposes, and the creation of a computational capability over millions of texts could be a positive step for research, especially in the social sciences and humanities. I hope that the botched attempt to commercialize the contents of libraries will not prejudice the future of digital research.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-897720017343221499?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/897720017343221499/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=897720017343221499' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/897720017343221499'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/897720017343221499'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2012/01/google-book-search-redux.html' title='Google Book Search Redux'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-7608990055294487773</id><published>2011-12-26T14:16:00.000-08:00</published><updated>2011-12-26T14:16:40.551-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><title type='text'>Google files motion to dismiss</title><content type='html'>&lt;blockquote class="tr_bq"&gt;"The claims of the associations should be dismissed without leave to amend because they lack standing as a matter of law, since they do not themselves own copyrights and do not meet the test for associational standing set forth in &lt;i&gt;Hunt&lt;/i&gt;." p. 19&lt;/blockquote&gt;With that conclusion, Google has &lt;a href="http://thepublicindex.org/docs/motions/993-memorandum-in-support.pdf"&gt;filed a motion&lt;/a&gt; asserting that the copyright infringement lawsuits filed by the Authors' Guild and the American Society of Media Photographers, Inc. be dismissed. The arguments made in the document are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;"Individual copyright owners' participation is necessary to establish a claim for copyright infringement." (p.1)&lt;/li&gt;&lt;li&gt;"Plaintiff associations do not own copyrights alleged to have been infringed, and do not have standing to sue for copyright infringement." (p.4)&lt;/li&gt;&lt;li&gt;"Every copyright, and every alleged copyright infringement, is different."(p.7)&lt;/li&gt;&lt;li&gt;"... a central issue in these cases is whether the conduct alleged in the Complaints constitute fair use under 17 U.S.C. 107. Litigating that issue will require the participation of individual association members, because many of the relevant facts are specific to the particular work in question." (p.11)&lt;/li&gt;&lt;/ul&gt;All of this sounds plausible to this legal novice, but there are a couple of puzzling issues. First, why did Google not make these arguments in 2005 when the Authors' Guild filed suit? Instead, they negotiated with the association for six years, presumably in good faith, and those negotiations hinged on the acceptance of the AG as a representative of authors and their rights in their works. If Google had thought that the AG did not have standing, none of that negotiation would have made much sense.&lt;br /&gt;&lt;br /&gt;Second, Google says in this document that fair use has to be determined on a case-by-case basis. They even quote from Campbell v. Acuff-Rose Music, Inc. that "Fair use must 'be judged case by case, in light of the ends of the copyright law....' It is 'not to be simplified with bright-line rules." (p.11) This seems to undermine Google's original defense that copying for the purposes of creating an index is itself fair use, not something that has to be determined on a case by case basis.&lt;br /&gt;&lt;br /&gt;It isn't surprising the Google wants to bring an end to this case. It is now entering its seventh year (the original suit was filed in September of 2005), and has undoubtedly been costly for all parties. Google was moving ahead in putting into place the foundations for the settlement, including the creation of a large database of works and a means for owners to claim the copyrights. They had designated a director for the Book Rights Registry, which would administer the business agreed on in the settlement. The failure of the settlement and the amended settlement to get court approval meant that all of that effort was for naught. Yet it isn't clear to me (and I hope someone can speak to this) what practical outcome Google is seeking for its book digitization effort. A dismissal of this nature would put Google in the rather cynical position of continuing book scanning knowing that few individual authors will have the means to take Google to court, and those individual payments would probably be affordable for this multi-billion dollar company. If dismissal is rejected, then at least that aspect of the suit is clarified, but next steps surely will be that the suit goes forward as first entered.&lt;br /&gt; &lt;br /&gt;The one thing that is clear is that negotiations between Google and the AG are no longer on the horizon.&lt;br /&gt;&lt;br /&gt;Note, also, that the Authors Guild has filed suit against the HathiTrust for copyright infringement, and the decision here will no doubt reflect on that case as well.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-7608990055294487773?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/7608990055294487773/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=7608990055294487773' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7608990055294487773'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7608990055294487773'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/12/google-files-motion-to-dismiss.html' title='Google files motion to dismiss'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-8662405114193687510</id><published>2011-12-22T13:23:00.000-08:00</published><updated>2011-12-22T13:23:54.258-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='open data'/><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>National Library of Sweden and OCLC fail to agree</title><content type='html'>In a blog post entitled "&lt;a href="http://www.kb.se/english/about/news/No-deal-with-OCLC/"&gt;No deal with OCLC&lt;/a&gt;" the National Library of Sweden has announced that after five years they have ended negotiations with OCLC to become participants in WorldCat. The point of difference was over the &lt;a href="http://www.oclc.org/worldcat/recorduse/policy/default.htm"&gt;OCLC record use policy&lt;/a&gt;. Sweden has declared the bibliographic data in the Swedish National Catalog, &lt;a href="http://libris.kb.se/"&gt;Libris&lt;/a&gt;, to be open for use without constraints.&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;"A fundamental condition for the entire Libris collaboration is voluntary participation. Libraries that catalogue in Libris can take out all their bibliographic records and incorporate them instead into another system, or use them in anyway the library finds suitable." (from the blog post)&lt;/blockquote&gt;This is an example of the down-stream constraint issues that we discussed while working on the &lt;a href="http://openbiblio.net/principles/"&gt;Open Bibliography Principles&lt;/a&gt; for the &lt;a href="http://okfn.org/"&gt;Open Knowledge Foundation&lt;/a&gt;. While open data may appear to be primarily an ideological stance it in fact has real practical implications. A bibliographic database is made up of records and data elements that can have uses in many contexts. In addition, the same bibliographic data may exist in numerous databases managed by members of entirely different communities. Someone may wish to create a new database or service using data coming from a variety of sources. At times someone will want to use only portions of records and may mix and match individual data elements from different sources. Any kind of constraints on use of the data, including something as seemingly innocuous as allowing all non-commercial use, require the user of the data to keep track of the source of each record or data element. Practically this means that an application using the mix of data is effectively constrained by the most strict contract in the mix.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;The Swedish library was concerned that their participating libraries would be hindered in their future systems and activities if any limitations were placed on data use. In addition, they would not be able to share their data with the &lt;a href="http://www.europeana.eu/portal/"&gt;Europeana&lt;/a&gt; project, as Europeana requires that the data contributed be open precisely because of the complications of managing hundreds or thousands of different sources with different obligations.&lt;br /&gt;&lt;br /&gt;As many of us pointed out during the discussions about the OCLC record use policy, the practical problems of controlling down-stream use of data are insurmountable. Some people argue that the record use policy hasn't affected libraries using WorldCat, but my experience is that the policy has a chilling effect on some libraries, and is making it more difficult for libraries to embrace the linked open data model. The Swedish National Library had to make the difficult decision between WorldCat services and future capabilities. It was undoubtedly a hard decision, but it is admirable that the National Library did not give up what it saw as important rights for its users.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-8662405114193687510?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/8662405114193687510/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=8662405114193687510' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8662405114193687510'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8662405114193687510'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/12/national-library-of-sweden-and-oclc.html' title='National Library of Sweden and OCLC fail to agree'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-5625809650173321535</id><published>2011-12-12T08:54:00.000-08:00</published><updated>2011-12-12T08:54:36.541-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='internet'/><category scheme='http://www.blogger.com/atom/ns#' term='privacy'/><title type='text'>Learning not to share</title><content type='html'>"Learning to share" used to be one of the basic lessons of childhood, with parents beaming the first time their offspring spontaneously handed half of a cookie to a playmate. But some time before that same child first puts fingers to keyboard she will have to learn a new lesson: not to share online.&lt;br /&gt;&lt;br /&gt;The Facebook phenomenon has taken that simple concept of sharing with others to an industrial level. Any page you go to on the Web today connects into your online social life, so that while reading the news or watching a video you are exhorted to share your activity with your online "friends." I say "friends" in quotes because the way that Facebook involvement grows means that many of the people seeing your posts or learning about your activities are like second and third cousins; related to your friends but at least a step removed from the inner circle you relate to. It is easy to forget that those more distant relations are there, but bit by bit the links pull in more invitations and, since we have been told that it is impolite not to share, we rarely slam the digital door on those seeking our friendship.&lt;br /&gt;&lt;br /&gt;To increase this digital sharing, the House has passed a revision to the Video Privacy Act. You may not recall the "Bork law" of 1988. It was one of the fastest privacy laws ever passed in the U.S. legislature. Here's the description from the &lt;a href="http://www.nytimes.com/2011/12/11/business/bill-would-let-video-consumers-disclose-all-their-choices.html?_r=1&amp;amp;scp=1&amp;amp;sq=video%20privacy&amp;amp;st=Search"&gt;New York Times article&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;&lt;a href="http://www.theamericanporch.com/bork2.htm" title="Reprint of the original article."&gt;In 1987, the Washington City Paper&lt;/a&gt;, a weekly newspaper, published the video rental records of Judge &lt;a class="meta-per" href="http://topics.nytimes.com/top/reference/timestopics/people/b/robert_h_bork/index.html?inline=nyt-per" title="More articles about Robert H. Bork."&gt;Robert H. Bork&lt;/a&gt;, who at the time was a nominee to the Supreme Court. One of the paper’s reporters had obtained the records from Potomac Video, a local rental store. Judge Bork’s choice of movies — he rented a number of classic feature films starring Cary Grant — may have seemed innocuous.     &lt;br /&gt;&lt;br /&gt;But the disclosure of Judge Bork’s cultural consumption so alarmed Congress that it quickly passed a law giving individuals the power to consent to have their records shared. The statute, nicknamed the “Bork law,” also made video services companies liable for damages if they divulged consumers’ records outside the course of ordinary business.        &lt;/blockquote&gt;&amp;nbsp;At the time the passage of the law had a comic aspect to it: you could imagine the thoughts going through the heads of members of Congress when they realized that any reporter could talk into their local video store and learn what they had rented. Zingo! New law!&lt;br /&gt;&lt;br /&gt;The revised bill, stated in the article as being backed primarily by Netflix, would allow consumers (and that's all we are, right, consumers?) to sign a blanket waiver on their video privacy in order to facilitate sharing with friends.&lt;br /&gt;&lt;br /&gt;The Times article has various quotes giving pros and cons, online services vs. privacy advocates, all talking about how much you do or don't want your "friends" to know about you. What the article fails to state, however, is that whether you like it or not, every site where you share is a de facto friend as well. If your Facebook friends get your Netflix picks, both Facebook and Netflix (and their advertising partners) also get your video viewing information. The more you share with your friends, the more you are sharing with an invisible network of corporations - who, by the way, you cannot "unfriend" even if you want to.&lt;br /&gt;&lt;br /&gt;This is why we need to learn not to share: it's a lie, a deceit. We aren't really sharing with our friends, our friends are being used to get us to divulge information to faceless corporations who have insinuated themselves into our lives for the sole purpose of benefiting from our consumption. They have distorted the entire idea of "friend," and turned it into a buyer's club for their benefit.&lt;br /&gt;&lt;br /&gt;Dear friends: I'm looking forward to seeing you ... offline.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-5625809650173321535?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/5625809650173321535/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=5625809650173321535' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5625809650173321535'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5625809650173321535'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/12/learning-not-to-share.html' title='Learning not to share'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-342437821354115016</id><published>2011-11-01T12:03:00.000-07:00</published><updated>2011-11-01T12:03:51.286-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MARC'/><category scheme='http://www.blogger.com/atom/ns#' term='library catalogs'/><category scheme='http://www.blogger.com/atom/ns#' term='linked data'/><category scheme='http://www.blogger.com/atom/ns#' term='cataloging'/><title type='text'>Future Format: Goals and Measures</title><content type='html'>The &lt;a href="http://www.loc.gov/marc/transition/pdf/bibframework-10312011.pdf"&gt;LC report on the future bibliographic format&lt;/a&gt; (aka replacement for MARC) is out. The report is short and has few specifics, other than the selection of RDF as the underlying data format. A significant part of the report lists requirements; these, too, are general in nature and may not be comprehensive. &lt;br /&gt;&lt;br /&gt;What needs to be done before we go much further is to begin to state our specific goals and the criteria we will use to determine if we have met those goals. Some goals we will discover in the course of developing the new environment, so this should be considered a growing list. I think it is important that every goal have measurements associated with it, to the extent possible. It makes no sense to make changes if we cannot know what those changes have achieved. Here are some examples of the kinds of things I am thinking of in terms of goals; these may not be the actual goals of the project, they are just illustrations that I have invented.&lt;br /&gt;&lt;br /&gt;COSTS&lt;br /&gt;&amp;nbsp;- goal: it should be less expensive to create the bibliographic data during the cataloging process&lt;br /&gt;&amp;nbsp;&amp;nbsp; measurement: using time studies, compare cataloging in MARC and in the new format&lt;br /&gt;&amp;nbsp;- goal: it should be less expensive to maintain the format&lt;br /&gt;&amp;nbsp;&amp;nbsp; measurement: compare the total time required for a typical MARBI proposal to the time required for the new format&lt;br /&gt;&amp;nbsp;- goal: it should be less expensive for vendors to make required changes or additions&lt;br /&gt;&amp;nbsp;&amp;nbsp; measurement: compare the number of programmer hours needed to make a change in the MARC environment and the new environment&lt;br /&gt;&lt;br /&gt;COLLABORATION&lt;br /&gt;&amp;nbsp;- goal: collaboration on data creation with a wider group of communities&lt;br /&gt;&amp;nbsp;&amp;nbsp; measurement: count the number of non-library communities that we are sharing data with before and after&lt;br /&gt;&amp;nbsp;- goal: greater participation of small libraries in shared data&lt;br /&gt;&amp;nbsp;&amp;nbsp; measurement: count number of libraries that were sharing before and after the change&lt;br /&gt;&amp;nbsp;- goal: make library data available for use by other information communities&lt;br /&gt;&amp;nbsp;&amp;nbsp; measurement: count use of library data in non-library web environments before and after&lt;br /&gt;&lt;br /&gt;INNOVATION&lt;br /&gt;&amp;nbsp;- goal: library technology staff should be able to implement "apps" for their libraries faster and easier than they can today.&lt;br /&gt;&amp;nbsp;&amp;nbsp; measurement: either number of apps created, or a time measure to implement (this one may be hard to compare)&lt;br /&gt;&amp;nbsp;- goal: library systems vendors can develop new services more quickly and more cheaply than before&lt;br /&gt;&amp;nbsp;&amp;nbsp; measurement: number of changes made in the course of a year, or number of staff dedicated to those changes. Another measurement would be what libraries are charged and how many libraries make the change within some stated time frame&lt;br /&gt;&lt;br /&gt;As you can tell from this list, most of the measurements require system implementation, not just the development of a new format. But the new format cannot be an end in itself; the goal has to be the implementation of systems and services using that format. The first MARC format that was developed was tested in the LC workflow to see if it met the needs of the Library. This required the creation of a system (called the "&lt;a href="http://www.blogger.com/%20http://openlibrary.org/works/OL6774532W/The_MARC_pilot_project"&gt;MARC Pilot Project&lt;/a&gt;") and a test period of one year. The testing that took place for RDA is probably comparable and could serve as a model. Some of the measurements will not be available before full implementation, such as the inclusion of more small libraries. Continued measurement will be needed.&lt;br /&gt;&lt;br /&gt;So, now, what are the goals that YOU especially care about?&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-342437821354115016?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/342437821354115016/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=342437821354115016' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/342437821354115016'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/342437821354115016'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/11/future-format-goals-and-measures.html' title='Future Format: Goals and Measures'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-50454301754819606</id><published>2011-10-17T21:01:00.000-07:00</published><updated>2011-10-17T21:01:52.902-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='library catalogs'/><category scheme='http://www.blogger.com/atom/ns#' term='classification LCSH'/><category scheme='http://www.blogger.com/atom/ns#' term='search'/><category scheme='http://www.blogger.com/atom/ns#' term='classification'/><title type='text'>Relativ index</title><content type='html'>Most of us, when we hear "Dewey Decimal Classification" (DDC) think about the numbers that go onto the backs of books that then tell us where the book can be found on the library's shelves. The subject classification and its decimal notation was only part of Dewey's invention, however. The other part was the "Relativ Index." The Relativ Index was the entry vocabulary for the classification scheme. It was to be consulted by library users as the way to find topics in the library. &lt;br /&gt;&lt;blockquote&gt;"The Index givs similar or sinonimus words, and the same words in different connections, any any intelijent person wil surely get the ryt number. A reader wishing to know sumthing of the tarif looks under T, and, at a glance, finds 337 as its number. This gyds him to shelvs, to all books and pamflets, to shelf catalog, to clast subject catalog on cards, to clast record of loans, and, in short, in simple numeric order, thruout the whole library to anything bearing on his subject." (Dewey, Edition 11, p. 10) (&lt;i&gt;Yes, that is how he spelled things.&lt;/i&gt;)&lt;/blockquote&gt;The most recent version of DDC that I own is from 1922, so this example is an entry in the Relativ Index of Edition 11 under "Leaves:"&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Leaves&amp;nbsp;&amp;nbsp; fertilizers&amp;nbsp; 631.872&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; shapes of&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; botany 581.4&lt;/blockquote&gt;&lt;br /&gt;In the schedules these classes are listed as:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;631.872 : "Vegetable manures, Leaves" (coming right after "Vegetable manures, Muck").&lt;br /&gt;581.4 : "Morfology&amp;nbsp;&amp;nbsp; comparativ anatomy"&lt;/blockquote&gt;&lt;br /&gt;You can see that the index is not just a repeat of the names of the points in the classification but is a kind of subject thesaurus on its own. It doesn't just point to the classification number but it gives some context ("fertilizers" "botany") to help the user decide which class number to select.&lt;br /&gt;&lt;br /&gt;What I find odd today in libraries (mainly public libraries) is that we do not have an entry vocabulary for the Dewey classification. Libraries in the U.S. use the Library of Congress Subject Headings even when their classification scheme is Dewey. While LC subject headings will lead you to a catalog entry that has a classification number, they aren't an index to that classification scheme. &lt;br /&gt;&lt;br /&gt;There are more oddities, actually. &lt;br /&gt;&lt;br /&gt;One oddity is that we never explain these classification numbers to the users. Yes, I can go from the catalog to the shelf and find books that are near the one I am seeking, but in a small public library I can encounter a number of different topics on a single shelf; and in a large academic library I can wander whole aisles without seeing a change in the initial class number and have no idea if I have exhausted my topic area on the shelf as decimal points three places out change. Yet there is nothing either at the shelf nor anywhere else in the library to tell me what those numbers mean except usually at a very macro level.&amp;nbsp; What I have before me are book spines and class numbers, and since I don't know what the class numbers mean I have to rely on the spine titles. So if I browse a shelf and see:&lt;br /&gt;&lt;br /&gt;364.106 D26f&amp;nbsp;&amp;nbsp; The first family&lt;br /&gt;364.106 En36h&amp;nbsp; Havana nocturne&lt;br /&gt;364.106 En36i&amp;nbsp; The Westies&lt;br /&gt;364.106 En36p&amp;nbsp; Paddy whacked&lt;br /&gt;&lt;br /&gt;... it may not be clear to me what topic I am looking at. At the very least I would like to be able to type "364.106" into an app on my phone and get a display something like:&lt;br /&gt;&lt;br /&gt;300&amp;nbsp; Social sciences&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 360&amp;nbsp; Social problems &amp;amp; social services&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; 364&amp;nbsp; Criminology&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; 364.106.... &lt;br /&gt;&lt;br /&gt;(That example is truncated because the divisions to the right of the decimal point are not&amp;nbsp; available to me. Presumably the display would take me down to .106, which would then have something to do with gangs and/or organized crime and/or mafia, but I'm just guessing at that.)&lt;br /&gt;&lt;br /&gt;Even better, I'd like to point my phone camera at a book spine and get a similar read-out. Yes, I know that's not going to be simple.&lt;br /&gt;&lt;br /&gt;Another oddity is that we put multiple subject headings on a bibliographic record, but only one classification number, reducing the role of classification to simply the ordering of books on the shelves. This means that there are subject headings on the records that would logically lead to class numbers other than the one that has been given.&lt;br /&gt;&lt;br /&gt;Using my crime books as an example, the subject headings are clearly more diverse than the single classification code:&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Mafia -- United States -- History&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Mafia -- United States -- Biography&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Criminals -- United States -- Biography&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Organized crime -- United States -- Case studies&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Lansky, Meyer, 1902-&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Luciano, Lucky, 1897-1962&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Mafia -- Cuba -- Havana&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Cuba -- History -- 1933-1959&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Havana (Cuba) -- Social conditions -- 20th century&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Westies (Gang) -- History&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Gangs -- New York (State) -- New York -- History&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Irish American criminals -- New York (State) -- New York -- History&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Hell's Kitchen (New York, N.Y.)&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Organized crime -- United States -- History&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Irish American criminals -- United States -- History&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Gangsters -- United States -- History&lt;br /&gt;&lt;br /&gt;This won't be a surprise to my readers, but this dual system is full of "gotchas" for users. If I look up "Irish American criminals" in the subject headings I retrieve some items in 364.106, some in the 920 area (biographies, but many users won't know that), and some in fiction (under the author's last name). It's not that there isn't a rhyme or reason, but there is nothing to explain the differences between these items to the library user that would justify going to three entirely different places in the library to explore this topic. My guess is that the system seems quite arbitrary.&lt;br /&gt;&lt;br /&gt;Things are a bit better in libraries that use Library of Congress Classification (LCC) along with LCSH, since the two seem to be developed with some coordination. In his essay "&lt;a href="http://www.guild2910.org/Peloponnesian%20War%20June%2013%202007.pdf"&gt;The Peloponneasian War and the Future of Reference&lt;/a&gt;" Thomas Mann, of the Library of Congress, explains how LCSH and LCC work together:&lt;br /&gt;&lt;blockquote&gt;"In order to find which areas of the bookstacks to browse, however, researchers need the subject headings in the library catalog to serve as the index to the class scheme. But the linkage between a subject heading and a classification number is usually dependent on the precoordination of multiple facets within the same string. For example, notice the specific linkages of the following precoordinated strings:&lt;br /&gt;&lt;br /&gt;Greece–History–Peloponnesian War, 431-404 B.C.: DF229-DF230&lt;br /&gt;Greece–History–19th century: DF803&lt;br /&gt;Greece–History–Acarnanian Revolt, 1836: DF823.6&lt;br /&gt;Greece–History–Civil War, 1944-1949: DF849.5"&lt;/blockquote&gt;This is the correlation that will appear in the &lt;a href="http://id.loc.gov/authorities/subjects/sh86002643"&gt;LCSH documentation&lt;/a&gt;, but this is not what the user sees in the catalog. A search in LC's catalog for Greece-History-19th century brings up books with a variety of classification numbers, the first four being:&lt;br /&gt;&lt;blockquote&gt;DF803 .H45&lt;br /&gt;DF725 .A14&lt;br /&gt;DF951.T45 &lt;br /&gt;DK508.95.O33&lt;/blockquote&gt;Again, the user is directed to different shelf locations from what seems to be a single subject heading, with no explanation of what these different locations mean.* It's got to be terribly confusing.&lt;br /&gt;&lt;br /&gt;Compact notation is essential for the ordering of books on the shelf. But it seems truly odd that we order the books on the shelf but do not tell users what the order means. This can be seen as providing a delightful serendipity, but I presume that we could provide serendipity with less intellectual effort than has been dedicated to DDC and LCC, which are both enormously detailed and growing more so each year in an attempt to encompass the complexity of the published world. How much richer would the user's library experience be if she understood the relationship between the items on the shelf? Does it make sense to create detailed and complex relationships that then are not understood or used? What would a shelf system look like that was meaningful to library users? in a small library? in a large library? And, finally, can we use computing power to overcome to limitations that brought us to the situation we are in today in terms of organized subject access?&lt;br /&gt;&lt;br /&gt;* Before someone explains to me that the first subject heading determines the class number... you know that, I know that, but millions of library users have no idea what the order of the subject headings means. Besides, library catalog users often don't see the full record with all of the subject headings. Even in the LC catalog subject headings are not included in the default display. We can't blame the users if they don't know what we don't help them know.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-50454301754819606?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/50454301754819606/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=50454301754819606' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/50454301754819606'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/50454301754819606'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/10/relativ-index.html' title='Relativ index'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-3600371457199834630</id><published>2011-10-03T10:10:00.000-07:00</published><updated>2011-10-04T09:59:39.178-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='classification'/><title type='text'>Organizing knowledge</title><content type='html'>At the LITA forum on Saturday I stated that classification and knowledge organization seem to have fallen off the library profession's radar. (&lt;a href="http://kcoyle.net/presentations/lita2011.html"&gt;LITA2011 keynote.&lt;/a&gt;) We have spent considerable amounts of time and money on making modifications to our cataloging rules (four times in about fifty years), but the discussion of how we organize information for our users has waned. I can illustrate what is at least my impression of this through some searches done against Google Books using its &lt;a href="http://ngrams.googlelabs.com/"&gt;nGram&lt;/a&gt; service.&lt;br /&gt;&lt;br /&gt;"Library classification" peaks around 1960, and drops off rapidly. (The chart ends at 2000.)&amp;nbsp; &lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;img border="0" height="151" src="http://3.bp.blogspot.com/-2KGtqLZzRCA/Toni2aFX0xI/AAAAAAAAA84/RXRIHxexVbk/s400/nGramLibClassn.jpg" style="margin-left: auto; margin-right: auto;" width="400" /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Library classification&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;img border="0" height="162" src="http://1.bp.blogspot.com/-u4ZTvzQdVrw/Toni3jT4cMI/AAAAAAAAA9E/jPI7g2G2qHY/s400/nGramFacetedClassn.jpg" style="margin-left: auto; margin-right: auto;" width="400" /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Faceted classification&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-u4ZTvzQdVrw/Toni3jT4cMI/AAAAAAAAA9E/jPI7g2G2qHY/s1600/nGramFacetedClassn.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&amp;nbsp;&lt;/a&gt; &lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;Faceted classification has a meteoric rise around the 1960's, but falls abruptly from 1970 to 1980. The rise possibly corresponds closely to the activities of the Classification Research Group, based in the UK, whose big interest was in faceted classification.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-u5NKpUGGRIc/Toni367SuOI/AAAAAAAAA9I/zNvXpl0csyE/s1600/nGramDecClassn.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="156" src="http://3.bp.blogspot.com/-u5NKpUGGRIc/Toni367SuOI/AAAAAAAAA9I/zNvXpl0csyE/s400/nGramDecClassn.jpg" width="400" /&gt;&amp;nbsp;&lt;/a&gt;&lt;/td&gt;&lt;td style="text-align: center;"&gt;&lt;/td&gt;&lt;td style="text-align: center;"&gt;&lt;/td&gt;&lt;td style="text-align: center;"&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Decimal Classification&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&amp;nbsp;The decimal classifications, most likely both Dewey and Universal, rise steadily up until the mid-1960's then begin a steep decline.&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-OfGXuk86UQI/Toni27OB6pI/AAAAAAAAA88/dBcdSlKLXr4/s1600/nGramKWSearch.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="158" src="http://3.bp.blogspot.com/-OfGXuk86UQI/Toni27OB6pI/AAAAAAAAA88/dBcdSlKLXr4/s400/nGramKWSearch.jpg" width="400" /&gt;&amp;nbsp;&lt;/a&gt;&lt;/td&gt;&lt;td style="text-align: center;"&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Keyword searching&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&amp;nbsp;Keyword searching comes along slowly in the 1960's and 70's then takes off from 1980 to 2000. Today, as we know, it's basically the only kind of information retrieval being discussed.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-33isv57tX_w/Toni3W2NfnI/AAAAAAAAA9A/wxMAjRWQYuI/s1600/nGramKO.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="160" src="http://1.bp.blogspot.com/-33isv57tX_w/Toni3W2NfnI/AAAAAAAAA9A/wxMAjRWQYuI/s400/nGramKO.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&amp;nbsp;Knowledge organization also has a steady rise through the 1970's and 80's, and seems to reach a peak that continues up to recent times. &lt;br /&gt;&lt;br /&gt;This is hardly a scientific study, but it illustrates what my gut was telling me, which was that keyword searching has essentially replaced any kind of classed access. That does make me wonder what is being discussed under the rubric of "knowledge organization."&amp;nbsp; Keyword indexing, per se, does not do any organization of knowledge; there are no classes or categories, no broader concepts or narrower concepts, no direction toward similar topics. It also has no facets, at least none based on the topic of the resource, only on its descriptive properties (date of publication, format, domain).&lt;br /&gt;&lt;br /&gt;Keyword searching is not organized knowledge. Any topical organization takes place after retrieval by the searcher, who must look through the retrievals and select those that are relevant. This in part explains why Wikipedia is the perfect complement to keyword searches: Wikipedia &lt;i&gt;is&lt;/i&gt; organized knowledge. A keyword search can pull up a Wikipedia page that will provide context, disambiguation, and pointers to related topics. I find increasingly that I begin my searches in Wikipedia when my searches are topical, leaving Google to function as my "internet phone book" when I need to find a specific person, company, product or document.&lt;br /&gt;&lt;br /&gt;It makes sense for us to ask now: is there any reason (other than shelf placement) to continue library classification practices? Keep your eyes on this space for more about that.&lt;br /&gt;&lt;br /&gt;Added note: Richard Urban offers this nGram view comparing all of the library classification phrases with the term "Ontologies":&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-or1jCvmb_EM/Tos4w1grHjI/AAAAAAAAA9M/nbVqd28RsFk/s1600/nGramOnto.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="323" src="http://4.bp.blogspot.com/-or1jCvmb_EM/Tos4w1grHjI/AAAAAAAAA9M/nbVqd28RsFk/s640/nGramOnto.jpg" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;As @repoRat tweeted: &lt;i&gt;Karen Coyle makes air whoosh out of my lungs. bit.ly/nArBBh Perhaps classification to be replaced by relationship metadata?&amp;nbsp; &lt;/i&gt;That's a distinct possibility, and we'd better get cracking on that! Many "ontologies" out there today are simple term lists, and few of them seem to have relationships that you can follow productively. What really excites me is the possibility of relationships that we haven't explored in the past, both between concepts and between resources; all of the "based on" "responds to" "often appear together" -- and lots more that my brain isn't sharp enough to even imagine.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-3600371457199834630?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/3600371457199834630/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=3600371457199834630' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3600371457199834630'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3600371457199834630'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/10/organizing-knowledge.html' title='Organizing knowledge'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-2KGtqLZzRCA/Toni2aFX0xI/AAAAAAAAA84/RXRIHxexVbk/s72-c/nGramLibClassn.jpg' height='72' width='72'/><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-9203897625633925901</id><published>2011-09-28T19:42:00.000-07:00</published><updated>2011-09-28T19:42:51.459-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='open access'/><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><title type='text'>Europe's national libraries support Open Data licensing</title><content type='html'>&amp;nbsp;"Meeting at the Royal Library of Denmark, the Conference of European National Librarians (CENL), has voted overwhelmingly to support the open licensing of their data. CENL represents Europe’s 46 national libraries, and are responsible for the massive collection of publications that represent the accumulated knowledge of Europe.&lt;br /&gt;&lt;br /&gt;What does that mean in practice?&lt;br /&gt;It means that the datasets describing all the millions of books and texts ever published in Europe – the title, author, date, imprint, place of publication and so on, which exists in the vast library catalogues of Europe – will become increasingly accessible for anybody to re-use for whatever purpose they want."&lt;br /&gt;&lt;br /&gt;From an &lt;a href="https://app.e2ma.net/app/view:CampaignPublic/id:1403149.7214447972/rid:48e64615892ac6adde9a4066e88c736c"&gt;announcement&lt;/a&gt; by the Conference of European National Libraries. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-9203897625633925901?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/9203897625633925901/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=9203897625633925901' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/9203897625633925901'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/9203897625633925901'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/09/europes-national-libraries-support-open.html' title='Europe&apos;s national libraries support Open Data licensing'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-2371478601557627640</id><published>2011-09-18T14:19:00.000-07:00</published><updated>2011-09-18T14:27:07.766-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MARC'/><title type='text'>Meaning in MARC: Indicators</title><content type='html'>I have been doing a study of the semantics of MARC data on the &lt;a href="http://futurelib.pbworks.com/w/page/29114548/MARC%20elements"&gt;futurelib wiki&lt;/a&gt;. An &lt;a href="http://journal.code4lib.org/articles/5468"&gt;article&lt;/a&gt; on what I learned about the fixed fields (00X) and the number and code fields (0XX) appeared in the &lt;i&gt;code4lib journal&lt;/i&gt;, issue 14, earlier this year. My next task was to tackle the variable fields in the MARC range 1XX-8XX.&lt;br /&gt;&lt;br /&gt;This is a huge task, so I started by taking a look at the MARC indicators in this tag range, and have expanded this to a short study of the &lt;a href="http://futurelib.pbworks.com/w/page/44421482/indicators"&gt;role that indicators play in MARC&lt;/a&gt;. I have to say that it is amazing how much one can stretch the MARC format with one or two single-character data elements.&lt;br /&gt;&lt;br /&gt;Indicators have a large number of effects on the content of the MARC fields they modify. Here is the categorization that I have come up with, although I'm sure that other breakdowns are equally plausible.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;I. Indicators that do not change the meaning of the field&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There are indicators that have a function, but it does not change the meaning of the data in the field or subfields.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Display constants: some, but not all, display constants merely echo the meaning of the tag, e.g. 775 &lt;b style="font-weight: normal;"&gt;Other Edition Entry, Second Indicator&lt;/b&gt;&lt;br /&gt;&lt;i&gt;Display constant controller&lt;/i&gt;&lt;br /&gt; # - Other edition available&lt;br /&gt; 8 - No display constant generated&lt;/li&gt;&lt;li&gt;Trace/Do not trace: I consider these indicators to be carry-overs from card production.&lt;/li&gt;&lt;li&gt;Non-filing indicators: similar to indicators that control displays, these indicators make it possible to sort (was &lt;i&gt;filing&lt;/i&gt;) titles properly, ignoring the initial articles ("The ", "A ", etc.).&amp;nbsp;&lt;/li&gt;&lt;li&gt;Existence in X collection: there are indicators in the 0XX range that let you know if the item exists in the collection of a national library.&amp;nbsp;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;II. Indicators that do change the meaning of the field&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Many indicators serve as a way to expand the meaning of a field without requiring the definition of a new tag.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Identification of the source or agency: a single field like a 650 topical subject field can have content from an unlimited list of controlled vocabularies because the indicator (or the indicator plus the $2 subfield) provides the identity of the controlled vocabulary.&lt;/li&gt;&lt;li&gt;Multiple types in a field: some fields can have data of different types, controlled by the indicator. For example, the 246 (Varying form of title) has nine different possible values, like Cover title or Spine title, controlled by a single indicator value.&amp;nbsp;&lt;/li&gt;&lt;li&gt;Pseudo-display controllers: the same indicator type that carries display constants that merely echo the meaning of the field also has a number of instances where the display constant actually indicates a different meaning for the field. One example is the 520 (Summary, etc.) field with display constants for "subject," "review," "abstract," and others.&amp;nbsp;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&amp;nbsp;Some Considerations&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Given the complexity of the indicators there isn't a single answer to how this information should be interpreted in a semantic analysis of MARC. I am inclined to consider the display constants and tracing indicators in section I to not have meaning that needs to be addressed. These are parts of the MARC record that served the production of card sets but that should today be functions of system customization. I would argue that some of these have local value but are possibly not appropriate for record sharing.&lt;br /&gt;&lt;br /&gt;The non-filing indicators are a solution to a problem that is evident in so many bibliographic applications. When I sort by title in &lt;a href="http://www.zotero.org/"&gt;Zotero&lt;/a&gt; or &lt;a href="http://www.mendeley.com/"&gt;Mendeley&lt;/a&gt;, a large portion of the entries are sorted under "The." The world needs a solution here, but I'm not sure what it is going to be. One possibility is to create two versions of a title: one for display, with the initial article, and one for sorting, without. Systems could do the first pass at this, as they often to today with taking author names and inverting them into "familyname, forenames" order. Of course, humans would have to have the ability to make corrections where the system got it wrong.&lt;br /&gt;&lt;br /&gt;The indicators that identify the source of a controlled vocabulary could logically be transformed into a separate data element for each vocabulary (e.g. "LCSH," "MeSH"). However, the number of different vocabularies is, while not infinite, very large and growing (as evidenced by the practice in MARC to delegate the source to a separate subfield that carries codes from a controlled list of sources), so producing a separate data element for each vocabulary is unwieldy, to say the least. At some future date, when controlled vocabularies "self-identify" using URIs this may be less of a problem. For now, however, it seems that we will need to have multi-part data elements for controlled vocabularies that include the source with the vocabulary terms.&lt;br /&gt;&lt;br /&gt;The indicators that further sub-type a field, like the 520 Summary field, can be fairly easily given their own data element since they have their own meaning. Ideally there would be a "type/sub-type" relationship where appropriate. &lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;And Some Problems&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There are a number of areas that are problematic when it comes to the indicator values. In many cases, the MARC standard does not make clear if the indicator modifies all subfields in the field, or only a select few. In some instances we can reason this out: the non-filing indicators only refer to the left-most characters of the field, so they can only refer to the $a (which is mandatory in each of those fields). On the other hand, for the values in the subject area (6XX) of the record, the source indicator relates to all of the subject subfields in the field. I assume, however, that in all cases the control subfields $3-$8 perform functions that are unrelated to the indicator values. I do not know at this point if there are fields in which the indicators function on some other subset of the subfields between $a and $z. That's something I still need to study.&lt;br /&gt;&lt;br /&gt;I also see a practical problem in making use of the indicator values in any kind of mapping from MARC to just about anything else. In 60% of MARC tags either one or both indicator positions is undefined. Undefined indicators are represented in the MARC record with blanks. Unfortunately there are also defined indicators that have a meaning assigned to the character "blank." There is nothing in the record itself to differentiate blank indicator values from undefined indicators. Any transformation from MARC to another format has to have knowledge about every tag and its indicators in order to do anything with these elements. This is another example of the complexity of MARC for data processing, and yet another reason why a new format could make our lives easier.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;More on the Wiki&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For anyone else who obsesses on these kinds of things there is more detail on all of this on the &lt;a href="http://futurelib.pbworks.com/w/page/44421482/indicators"&gt;futurelib wiki&lt;/a&gt;. I welcome comments here, and on the wiki. If you wish to comment on the wiki, however, I need to add your login to the site (as an anti-spam measure). I will undoubtedly continue my own obsessive behavior related to this task, but I really would welcome collaboration if anyone is so inclined. I don't think that there is a single "right answer" to the questions I am asking, but am working on the principle that some practical decisions in this area can help us as we work on a future bibliographic carrier.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-2371478601557627640?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/2371478601557627640/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=2371478601557627640' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/2371478601557627640'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/2371478601557627640'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/09/meaning-in-marc-indicators.html' title='Meaning in MARC: Indicators'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-8023720819061908963</id><published>2011-09-16T15:17:00.000-07:00</published><updated>2011-09-16T15:17:45.974-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='FRBR'/><category scheme='http://www.blogger.com/atom/ns#' term='RDA'/><category scheme='http://www.blogger.com/atom/ns#' term='cataloging'/><title type='text'>European Thoughts on RDA</title><content type='html'>Some European libraries are asking the question: "Should we adopt RDA as our cataloging code?" The discussion is happening through the &lt;a href="http://www.slainte.org.uk/eurig/"&gt;European RDA Interest Group (EURIG)&lt;/a&gt;. Members of EURIG are preparing &lt;a href="http://www.slainte.org.uk/eurig/documents.htm"&gt;reports&lt;/a&gt; on what they see as the possibilities that RDA could become a truly international cataloging code. With the increased sharing of just about everything between Europe's countries -- currency, labor force, media, etc. -- the vision of Europe's libraries as a cooperating unit seems to be a no-brainer.&lt;br /&gt;&lt;br /&gt;There are interesting comments in the presentations available from the EURIG meeting. For example:&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;Spain has done comparisons with current cataloging and some testing using MARC21. They conclude: "Our decision will probably depend on the flexibility to get the different lists, vocabularies, specific rules... that we need." In other words, it all depends on being able to customize RDA to local practice.&lt;br /&gt;&lt;br /&gt;Germany sees RDA as having the potential to be an international set of rules for data sharing (much like ISBD today), with national profiles for internal use. Germany has starting translating the RDA vocabulary terms in the &lt;a href="http://metadataregistry.org/rdabrowse.htm/"&gt;Open Metadata Registry&lt;/a&gt;, but notes that translation of the text must be negotiated with the co-publishers of RDA, that is the American Library Association, the Canadian Library Association, and CILIP.&lt;br /&gt;&lt;br /&gt;The most detail, though, comes from a report by the French libraries. (The French are totally &lt;a href="http://kcoyle.blogspot.com/2010/01/pardon-my-french.html"&gt;winning my heart&lt;/a&gt; as a smart and outspoken people. Their response to the Google Books Settlement was wonderful.) This &lt;a href="http://www.slainte.org.uk/eurig/docs/BnF-ADM-2011-066286-01_%28p2%29.pdf"&gt;report&lt;/a&gt; brings up some key issues about RDA from outside the JSC. &lt;br /&gt;&lt;br /&gt;First, it is said in this report, and also in some of the EURIG presentations from their meeting, that it is RDA's implementation of FRBR that makes it a candidate for an international cataloging code. FRBR is seen as the model that will allow library metadata to have a presence on the Web, and many in the library profession see getting a library presence on the Web as an essential element of being part of the modern information environment. One irony of this, though, is that Italy already has a cataloging code based on FRBR, &lt;a href="http://www.iccu.sbn.it/opencms/opencms/it/main/attivita/gruppilav_commissioni/pagina_94.html"&gt;REICAT&lt;/a&gt;, but that has gotten little attention. (A large segment of it is &lt;a href="http://www.iccu.sbn.it/opencms/export/sites/iccu/documenti/ReicatEN.pdf"&gt;available in English&lt;/a&gt; if you are curious about their approach. ) &lt;br /&gt;&lt;br /&gt;The French interest in FRBR is specifically about &lt;a href="http://www.rda-jsc.org/docs/5editor2rev.pdf"&gt;Scenario 1 &lt;/a&gt;as defined in RDA; a model with defined entities and links between them. An implementation of Scenario 2, which links authority records to bibliographic records, would be a mere replication of what already exists in France's catalogs. In other words, they have already progressed to level 2 while U.S. libraries are still stuck in level 3, the flat data model. &lt;br /&gt;&lt;br /&gt;Although the French libraries see an advantage to using RDA, they also have some fairly severe criticisms. Key ones are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;it ignores ISO standards, and does not follow IFLA standards, such as Names of person, or Anonymous classics*&lt;/li&gt;&lt;li&gt;it is a follow-on to, and makes concessions to, AACR(1 and 2), which is not used by the French libraries&lt;/li&gt;&lt;li&gt;it proposes one particular interpretation of FRBR, not allowing for others, and defines each element as being exclusively for use with a single FRBR entity&lt;/li&gt;&lt;/ul&gt;They recommend considering the possibility of creating a European profile of RDA scenario 1. This would give the European libraries a cataloging code based on RDA but under their control. They do ask, however, what the impact on global sharing will be if different library communities use different interpretations of FRBR. (My answer: define your data elements and exchange data elements; implement FRBR inside systems, but make it possible to share data apart from any particular FRBR structuring.)&lt;br /&gt;&lt;br /&gt;* There is a strong adherence to ISO and IFLA standards outside of the U.S. I don't know why we in the U.S. feel less need to pay attention to those international standards bodies, but it does separate us from the greater library community.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;(Thanks to John Hostage of Harvard for pointing out the recent EURIG activity on the RDA-L list.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-8023720819061908963?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/8023720819061908963/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=8023720819061908963' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8023720819061908963'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8023720819061908963'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/09/european-thoughts-on-rda.html' title='European Thoughts on RDA'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-4127167792040916661</id><published>2011-09-16T14:29:00.000-07:00</published><updated>2011-09-16T14:29:36.935-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><category scheme='http://www.blogger.com/atom/ns#' term='copyright'/><title type='text'>Due diligence do-over</title><content type='html'>In what I see as both a brave and an appropriate move, the University of Michigan admitted &lt;a href="http://www.lib.umich.edu/news/u-m-library-statement-orphan-works-project"&gt;publicly&lt;/a&gt; that the Authors Guild had found some serious flaws in its process for identifying orphan works. The statement reaffirms the need to identify orphan works, and promises to revise its procedures.&lt;br /&gt;&lt;blockquote&gt;"Having learned from our mistakes—we are, after all, an educational institution—we have already begun an examination of our procedures to identify the gaps that allowed volumes that are evidently not orphan works to be added to the list. Once we create a more robust, transparent, and fully documented process, we will proceed with the work, because we remain as certain as ever that our proposed uses of orphan works are lawful and important to the future of scholarship and the libraries that support it."&lt;/blockquote&gt;Among other things, what I find interesting in all this is that no one seems to be wondering why our copyright registration process is so broken that sometimes even the rights holders themselves don't know that they are the rights holders. It really shouldn't be this hard to find out if a work is covered by copyright. Larry Lessig covered this in his book "Free Culture," which is available &lt;a href="http://www.authorama.com/free-culture-24.html"&gt;online&lt;/a&gt;. The basic process of identifying copyrights is broken, and the burden is being placed on those who wish to make use of works. This is a clear anti-progress, pro-market bias in our copyright system.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-4127167792040916661?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/4127167792040916661/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=4127167792040916661' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/4127167792040916661'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/4127167792040916661'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/09/due-diligence-do-over.html' title='Due diligence do-over'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-1957596732762364914</id><published>2011-09-15T10:04:00.000-07:00</published><updated>2011-09-18T14:27:28.466-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><category scheme='http://www.blogger.com/atom/ns#' term='copyright'/><title type='text'>Diligence due</title><content type='html'>Oooof! Talk about making a BIG, public mistake.&lt;br /&gt;&lt;br /&gt;HathiTrust's new Orphan Works Project proposed to do due diligence on works, then post them on the HT site for 90 days, after which those works would be assumed to be orphans and would then be made available (in full text) to members of the HT cooperating institutions. Sounds good, right? (Well, maybe other than the fact of posting the works on a site that few people even know about...)&lt;br /&gt;&lt;br /&gt;The Authors Guild blog posted &lt;a href="http://blog.authorsguild.org/2011/09/14/found-one-we-re-unite-an-author-with-an-%e2%80%9corphaned-work-%e2%80%9d/"&gt;yesterday&lt;/a&gt; that it had found the rights holder of one of the books on HTs orphan works list in a little over 2 minutes using Google. (It's hard to believe that they didn't know this when the suit was filed on September 13 -- this is brilliant PR, if I ever saw it.) They then &lt;a href="http://blog.authorsguild.org/2011/09/14/two-more-another-professor-emeritus-stanford-and-a-pulitzer-winner-who-left-rights-to-harvard/"&gt;reported&lt;/a&gt; finding two others.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://james.grimmelmann.net/"&gt;James Grimmelman&lt;/a&gt;, Associate Professor at New York Law School and someone considered expert on the Google Books case, has titled his&lt;a href="http://laboratorium.net/archive/2011/09/15/hathitrust_single-handedly_sinks_orphan_works_refo"&gt; blog post&lt;/a&gt; on this: "HathiTrust Single-Handedly Sinks Orphan Works Reform," stating that this incident will be brought up whenever anyone claims to have done due diligence on orphan works. I'm not quite as pessimistic as James, but I do believe this will be brought up in court and will work against HT.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-1957596732762364914?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/1957596732762364914/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=1957596732762364914' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1957596732762364914'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1957596732762364914'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/09/diligence-due.html' title='Diligence due'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-3503146549922591500</id><published>2011-09-14T08:21:00.000-07:00</published><updated>2011-09-15T20:42:20.045-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><category scheme='http://www.blogger.com/atom/ns#' term='copyright'/><title type='text'>Authors Guild in Perspective</title><content type='html'>In its suit against HathiTrust the three authors guilds claim that there are digitized copies of millions of copyrighted books in HathiTrust, and that these should be removed from the database and stored in escrow off-line.&lt;br /&gt;&lt;br /&gt;A relevant question is: who do the authors guilds represent, and how many of those books belong to the represented authors?&lt;br /&gt;&lt;br /&gt;The combined members of the three authors guilds is about 13,000. That seems like a significant number, but the Library of Congress &lt;a href="http://id.loc.gov/authorities/names.html"&gt;name authority file&lt;/a&gt; has about 8 million names. That file also contains name/title combinations, and I don't have any statistics that tell me how many of those there are. (If anyone out there has a copy of the file and can run some stats on it, I'd greatly appreciate it.) Some of the names are for writers whose works are all in the public domain. Yet no matter how we slice it, the authors guilds of the lawsuit represent a small percentage of authors whose in-copyright works are in the HathiTrust database.&lt;br /&gt;&lt;br /&gt;The legal question then is: does this lawsuit pertain to all in-copyright works in HathiTrust, or only those by the represented authors? Could I, for example, sue HathiTrust for violating Fay Weldon's copyright?&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;Reply to this from James Grimmelman on his &lt;a href="http://laboratorium.net/"&gt;blog&lt;/a&gt;:&lt;br /&gt;&lt;i&gt;Good question, Karen, and one I plan to address in more detail in a civil procedure post in the next few days.  In brief, you couldn’t sue to enforce Fay Weldon’s copyright, as you aren’t an “owner” of any of the rights in it.  The Authors Guild and other organizations can sue on behalf of their members, but the details of associational standing are complicated.  There is also the question of the scope of a possible injunction (e.g. could Fay Weldon win as to one of her works and obtain an injunction covering others, or works by others), where there are also significant limits on how far the court can go.  Again, more soon.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;As I suspected, the legal issues are complex. Keep an eye on James' blog for more on this.&lt;i&gt; &lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-3503146549922591500?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/3503146549922591500/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=3503146549922591500' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3503146549922591500'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3503146549922591500'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/09/authors-guild-in-perspective.html' title='Authors Guild in Perspective'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-1902058001670343773</id><published>2011-09-12T22:15:00.000-07:00</published><updated>2011-09-14T08:20:08.849-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><category scheme='http://www.blogger.com/atom/ns#' term='copyright'/><category scheme='http://www.blogger.com/atom/ns#' term='Digital libraries'/><title type='text'>Authors Guild Sues HathiTrust</title><content type='html'>There has been a period of limbo since Judge Chin rejected the proposed settlement between the Author's Guild/Association of American publishers and Google. In fact, a supposedly final meeting between the parties is scheduled for this Thursday, 9/15, in the judge's court. &lt;br /&gt;&lt;br /&gt;Monday, 9/12, the Author's Guild (and partners) &lt;a href="http://www.authorsguild.org/advocacy/articles/authors-3.html"&gt;filed suit&lt;/a&gt; against HathiTrust (and partners) for some of the same "crimes" of which it had accused Google: essentially making unauthorized copies of in-copyright texts. In addition, the recent announcement that the libraries would allow their users to access items that had been deemed to be orphan works figures in the suit. That this suit has come over 6 years since the original suit against Google is in itself interesting. Nearly all of the actions of HathiTrust and its member libraries fall within what would have been allowed if the agreement that came out of that suit had been approved by the court. Although we do not know the final outcome of that suit (and anxiously await Thursdays meeting to see if it is revelatory), this suit against the libraries is surely a sign that AG/AAP and Google have not come to a reconciliation.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Suit &lt;/b&gt;&lt;br /&gt;&lt;br /&gt;First, the suit establishes that the libraries received copies of Google-digitized items from Google, and have sent copies of these items to HathiTrust, which in turn makes some number of copies as part of its archival function. This is followed by a somewhat short exposition of the areas of copyright law that are pertinent, with an emphasis on section 108, which allows libraries to make limited copies to replace deteriorating works. The suit states that the copying being done is not in accord with section 108. Then it refers to the &lt;a href="http://www.lib.umich.edu/orphan-works/copyright-holders"&gt;Orphan Works Project&lt;/a&gt; that several libraries are partnering in, and the plan on the part of the libraries to make the full text of orphan works available to institutional users.&lt;br /&gt;&lt;br /&gt;Since most of these institutions (if not all of them) are state institutions that have protections against paying out large sums in a lawsuit of this nature, the goal is to regain the control of the works by forcing HathiTrust (and the named libraries) to transfer their digital copies of in-copyright works to a "commercial grade" escrow agency with the files held off network "pending an appropriate act of Congress."&lt;br /&gt;&lt;br /&gt;As James Grimmelman comments in his &lt;a href="http://laboratorium.net/archive/2011/09/12/the_orphan_wars"&gt;blog post&lt;/a&gt; on the suit, there's a lot of mixing up between the orphan works and owned works in the suit. He points out that a group of organizations representing authors could hardly make a case for orphan works since, by definition, the lack of ownership of the orphans means they can't be represented by a guild of people defending their own works.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Problems&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;There are numerous problems that I see in the text of this suit. (IANAL, just a Librarian.) &lt;br /&gt;&lt;ul&gt;&lt;li&gt;The suit mentions large numbers of books that have been copied without permission, but makes no attempt to state how many of those books belong to the members of the plaintiff organizations. &lt;/li&gt;&lt;li&gt;The suit throws around large numbers without clearly stating that none of the statements include Public Domain works. It isn't clear, therefore, what the numbers represent: the entire holdings of HathiTrust, or just the in-copyright holdings. Also, in relation to the latter, unless one has done a considerable amount of work there are many works that are post-1923 that are also in the Public Domain. Cutting off at that year does not account for works that were not renewed, or were never copyrighted. I also doubt if anyone has a clear idea how many of the works in question are Public Domain because they are US Federal documents. This imprecision on the copyright status of works is very frustrating, but HathiTrust is not to blame for this state of affairs.&lt;/li&gt;&lt;li&gt;Some of their claims do not seem to me to be within legal bounds. For example, in one section they claim that although HathiTrust is not giving users access to in-copyright works, they potentially could. Where does that fit in? &lt;/li&gt;&lt;li&gt;They also claim that there is a risk of unauthorized access. However, the security at HathiTrust meets the security standards that the Author's Guild agreed to in the (unapproved) settlement with Google. If it was good enough then, why is it now too risky?&lt;/li&gt;&lt;li&gt;They claim that the libraries themselves have been digitizing in-copyright books. I wasn't aware of this, and would like to know if this is the case.&lt;/li&gt;&lt;li&gt;They state that the libraries said that before Google it was costing them $100 a book for digitization. Then the plaintiffs say that this means that the value of the digital files is in the hundreds of millions of dollars. First, I have heard figures that are more like $30 a book. Second, I don't see how the cost to digitize can translate into a value that is relevant to the complaint. &lt;/li&gt;&lt;li&gt;Although the legislature has failed to pass an orphan works law that would allow the use of these materials and still benefit owners if they do come forth, it seems like a poor strategy to complain about a well-designed program of due diligence and notification, which is what the libraries have designed. Orphan works are the least available works: if you have an owner you can ask permission; if there is no owner you cannot ask permission and therefore there is no way to use the work if your use falls outside of fair use. It's hard to argue for taking these works entirely out of the cultural realm simply because we have a poorly managed copyright ownership record.&lt;/li&gt;&lt;li&gt;There are a few odd sections where they make reference to bibliographic data as though that were part of the "unauthorized digitization" rather than data that was created by and belongs to the libraries. There's an odd attempt to make bibliographic data searching seem nefarious.&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;Parties&lt;/b&gt;&lt;br /&gt;Plaintiffs: The Author's Guild, Inc.; The Australian Society of Authors limited ; Union des Erivaines Quebecois; Pat Cummings; &lt;a href="http://catalog.hathitrust.org/Record/000883457"&gt;Angelo Loukakis&lt;/a&gt;; &lt;a href="http://catalog.hathitrust.org/Record/002483502"&gt;Roxana Robinson&lt;/a&gt;; Andre Roy; James Shapiro; Daniele Simpson; T.J. Stiles; and &lt;a href="http://catalog.hathitrust.org/Record/000914916"&gt;Fay Weldon&lt;/a&gt;. (Links are to some sample HathiTrust records.)&lt;br /&gt;&lt;br /&gt;Defendants: HathiTrust; The Regents of the University of California, The Board of Regents of the University of Wisconsin System; The Trustees of Indiana University; and Cornell University.&lt;br /&gt;&lt;br /&gt;Links&lt;br /&gt;&lt;a href="http://boingboing.net/2011/09/13/authors-guild-declares-war-on-university-effort-to-rescue-orphaned-books.html"&gt;Boing Boing&lt;/a&gt;: Authors Guild declares war on university effort to rescue orphaned books&lt;br /&gt;&lt;a href="http://www.libraryjournal.com/lj/home/892021-264/copyright_clash_authors_guild_and.html.csp"&gt;Library Journa&lt;/a&gt;l: Copyright Clash &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-1902058001670343773?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/1902058001670343773/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=1902058001670343773' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1902058001670343773'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1902058001670343773'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/09/authors-guild-sues-hathitrust.html' title='Authors Guild Sues HathiTrust'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-7683383278040490084</id><published>2011-09-09T09:31:00.000-07:00</published><updated>2011-09-09T09:31:39.472-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MARC'/><category scheme='http://www.blogger.com/atom/ns#' term='RDA'/><title type='text'>MARC vs RDA</title><content type='html'>As LC ponders the task of moving to a &lt;a href="http://www.loc.gov/marc/transition/"&gt;bibliographic framework&lt;/a&gt;, I can't help but worry about how much the past is going to impinge on our future. It seems to me that we have two potentially incompatible needs at the moment: the first is to fix MARC, and the second is to create a carrier for RDA. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Fixing MARC&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For well over a decade some of us have been suggesting that we need a new carrier for the data that is currently stored in the MARC format. The record we work with today is full of kludges brought on by limitations in the data format itself. To give a few examples:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;041 Language Codes - We have a single language code in the 008 and a number of other language codes (e.g. for original language of an abstract) in 041. The language code in the 008 is not "typed" so it must be repeated in the 041 which has separate subfields for different language codes. However, 041 is only included when more than one language code is needed. This means that there are always two places one must look to find language codes.&lt;/li&gt;&lt;li&gt;006 Fixed-Length Data Elements, Additional Material Characteristics - The sole reason for the existence of the 006 is that the 008 is not repeatable. The fixed-length data elements in the 006 are repeats of format-specific elements in the 008 so that information about multi-format items can be encoded.&lt;/li&gt;&lt;li&gt;773 Host Item Entry - All of the fields for related resources (76X-78X) have the impossible task of encoding an entire bibliographic description in a single field. Because there are only 26 possible subfields (a-z) available for the bibliographic data, data elements in these fields are not coded the same as they are in other parts of the record. For example, in the 773 the entire main entry is entered in a single subfield ("$aDesio, Ardito, 1897-") as opposed to the way it is coded in any X00 field ("$aDesio, Ardito,$d1897-").&lt;/li&gt;&lt;/ul&gt;Had we "fixed" MARC ten years ago, there might be less urgency today to move to a new carrier. As it is, data elements that were added so that the RDA testing could take place have made the format look more and more like a Rube Goldberg contraption. The MARC record is on life support, kept alive only through the efforts of the poor folks who have to code into this illogical format. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;A Carrier for RDA&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The precipitating reason for LC's &lt;a href="http://www.loc.gov/marc/transition/"&gt;bibliographic framework project&lt;/a&gt; is RDA. One of the clearest results of the RDA tests that were conducted in 2010 was that MARC is not a suitable carrier for RDA. If we are to catalog using the new code, we must have a new carrier. I see two main areas where RDA differs "record-wise" from the cataloging codes that informed the MARC record: &lt;br /&gt;&lt;ul&gt;&lt;li&gt;RDA implements the FRBR entities&lt;/li&gt;&lt;li&gt;RDA allows the use of identifiers for entities and terms&lt;/li&gt;&lt;/ul&gt;Although many are not aware of it, there already is a solid foundation for an RDA carrier in the registered elements and vocabularies in the &lt;a href="http://metadataregistry.org/rdabrowse.htm"&gt;Open Metadata Registry&lt;/a&gt;. Not long ago I was able to &lt;a href="http://kcoyle.blogspot.com/2011/07/rda-in-xml-why-not-give-it-shot.html"&gt;show &lt;/a&gt;that one could use those elements and vocabularies to create an RDA record. A full implementation of RDA will probably require some expansion of the data elements of RDA because the current list that one finds in the RDA &lt;a href="http://access.rdatoolkit.org/"&gt;Toolkit&lt;/a&gt; was not intended to be fully detailed. &lt;br /&gt;&lt;br /&gt;To my mind, the main complications about a carrier for RDA have to do with FRBR and how we can most efficiently create relationships between the FRBR entities and manage them within systems. I suspect that we will need to accommodate multiple FRBR scenarios, some appropriate to data storage and others more appropriate to data transmission. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Can We Do Both?&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is my concern: creating a carrier for RDA will not solve the MARC record problem; solving the MARC record problem will not provide a carrier for RDA. There may be a way to combine these two needs, but I fear that a combined solution would end up creating a data format that doesn't really solve either problem because of the significant difference between the AACR conceptual model and that of RDA/FRBR. &lt;br /&gt;&lt;br /&gt;It seems that if we want to move forward, we may have to make a break with the past. We may need to freeze MARC for those users continuing to create pre-RDA bibliographic data, and create an RDA carrier that is true to the needs of RDA and the systems that will be built around RDA data, with any future enhancements taking place only to the new carrier. This will require a strategy for converting data in MARC to the RDA carrier as libraries move to systems based on RDA. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Next: It's All About the Systems&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In fact, the big issue is not data conversion but what the future systems will require in order to take advantage of RDA/FRBR. This is a huge question, and I will take it up in a new post, but just let me say here that it would be folly to devise a data format that is not based on an understanding of the system requirements that can fulfill desired functionality and uses. &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-7683383278040490084?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/7683383278040490084/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=7683383278040490084' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7683383278040490084'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7683383278040490084'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/09/marc-vs-rda.html' title='MARC vs RDA'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-4720057814065742150</id><published>2011-09-07T11:17:00.000-07:00</published><updated>2011-09-07T11:17:56.007-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MARC'/><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><title type='text'>XML and Library Data Future</title><content type='html'>There is sometimes the assumption that the future data carrier for library data will be XML. I think this assumption may be misleading and I'm going to attempt to clarify how XML may fit into the library data future. Some of this explanation is necessarily over-simplified because a full exposition of the merits and de-merits of XML would be a tome, not a blog post. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;What is XML?&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The eXtensible Markup Language (&lt;a href="http://en.wikipedia.org/wiki/Xml"&gt;XML&lt;/a&gt;) is a highly versatile markup language. A &lt;a href="http://en.wikipedia.org/wiki/Markup_language"&gt;markup language&lt;/a&gt; is primarily a way to encode text or other expressions so that some machine-processing can be performed. That processing can manage display (e.g. presenting text in bold or italics) or it can be similar to metadata encoding of the meaning of a group of characters ("dateAndTime"). It makes the expression more machine-usable. It is not a data model in itself, but it can be used to mark up data based on a wide variety of models.*&lt;br /&gt;&lt;br /&gt;XML is the flagship standard in a large family of &lt;a href="http://en.wikipedia.org/wiki/List_of_XML_markup_languages"&gt;markup languages&lt;/a&gt;, although not the first: it is an evolution of SGML which had (perhaps necessary) complexities that rendered it very difficult for most mortals to use. It's also the conceptual granddaddy of HTML, a much simplified markup language that many of us take for granted.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Defining Metadata in XML&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There is a difference between using XML as a markup for documents or data and using XML to define your data. XML has some inherent structural qualities that may not be compatible with what you want your data to be. There is a reason why XML "records" are generally referred to as "documents": they tend to be quite linear in nature, with a beginning, a middle, and an end, just like a good story. &lt;br /&gt;&lt;br /&gt;XML's main structural functionality is that of nesting, or the creation of containers that hold separate bits of data together.&lt;br /&gt;&lt;br /&gt;&amp;lt;paragraph&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;sentence&amp;gt;&amp;lt;/sentence&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;sentence&amp;gt;&amp;lt;/sentence&amp;gt;&amp;nbsp;...&lt;br /&gt;&amp;lt;/paragraph&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;name&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;familyname&amp;gt;&amp;lt;/familyname&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;forenames&amp;gt;&amp;lt;/forenames&amp;gt;&lt;br /&gt;&amp;lt;/name&amp;gt;&lt;br /&gt;&lt;br /&gt;This is useful for document markup and also handy when marking up data. It is not unusual for XML documents to have nesting of elements many layers deep. This nesting, however, can be deceptive. Just because you have things inside other things does not mean that the relationship is anything more than a convenience for the application for which it was designed. &lt;br /&gt;&lt;br /&gt;&amp;lt;customer&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;customerNumber&amp;gt;&amp;lt;/customerNumber&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;phoneNumber&amp;gt;&amp;lt;/phoneNumber&amp;gt;&lt;br /&gt;&amp;lt;/customer&amp;gt;&lt;br /&gt;&lt;br /&gt;Nested elements are most frequently in a whole/part relationship, with the container representing the whole and holding the elements (parts) together as a unit (in particular a unit that can be repeated). &lt;br /&gt;&lt;br /&gt;&amp;lt;address&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;street1&amp;gt;&amp;lt;/street1&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;street2&amp;gt;&amp;lt;/street2&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;city&amp;gt;&amp;lt;/city&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;state&amp;gt;&amp;lt;/state&amp;gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;zip&amp;gt;&amp;lt;/zip&amp;gt;&lt;br /&gt;&amp;lt;/address&lt;br /&gt;&lt;br /&gt;While usually not hierarchical in the sense of genus/species or broader/narrower, this nesting has some of the same data processing issues that we find in other hierarchical arrangements:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The difficulty of placing elements in a single hierarchy when many elements could be logically located in more than one place. That problem has to be weighed against the inconvenience and danger of carrying the same data more than once in a record or system and the chances that these redundant elements will not get updated together.&lt;/li&gt;&lt;li&gt;The need to traverse the whole hierarchy to get to "buried" elements. This was the pain-in-the-neck that caused most data processing shops to drop hierarchical database management systems for relational ones. XML tools make this somewhat less painful, but not painless.&lt;/li&gt;&lt;li&gt;Poor interoperability. The same data element can be in different containers in different XML documents, but the data elements may not be usable outside the context of the containing element (e.g. "street2"). &lt;/li&gt;&lt;/ul&gt;Nesting, like hierarchy, is necessarily a separation of elements from each other, and XML does not provide a way to bring these together for a different view. Contrast the container structure of XML and a graph structure. &lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-SJqxHGa6Pj0/TmesHjsphYI/AAAAAAAAA80/U0L4LxZVBuo/s1600/Slide2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://4.bp.blogspot.com/-SJqxHGa6Pj0/TmesHjsphYI/AAAAAAAAA80/U0L4LxZVBuo/s320/Slide2.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-Ml2-WlTZ_LI/TmesEjIu8vI/AAAAAAAAA8w/az6-1rFmKlo/s1600/Slide1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://4.bp.blogspot.com/-Ml2-WlTZ_LI/TmesEjIu8vI/AAAAAAAAA8w/az6-1rFmKlo/s320/Slide1.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;In the nested XML structure some of the same data is carried in separate containers and there isn't any inherent relationship between them. Were this data entered into a relational database it might be possible to create those relationships, somewhat like the graph view. But as a record the XML document has separate data elements for the same data because the element is not separate from the container. In other words, the XML document has two different data elements for the zip code:&lt;br /&gt;&lt;br /&gt;&amp;nbsp; address:zip&lt;br /&gt;&amp;nbsp; censusDistrict:zip&lt;br /&gt;&lt;br /&gt;To use a library concept as an analogy, the nesting in XML is like pre-coordination in library subject headings. It binds elements together in a way that they cannot be readily used in any other context. Some coordination is definitely useful at the application level, but if all of your data is pre-coordinated it becomes difficult to create new uses for new contexts.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Avoid XML Pitfalls &lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;XML does not make your data any better than it was, and it can be used to mark up data that is illogically organized and poorly defined. A misstep that I often see is data designers beginning to use XML before their data is fully described, and therefore letting the structure and limitations of XML influence what their data can express. Be very wary of any project that decides that the data format will be XML before the data itself has been fully defined. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;XML and Library data&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If XML had been available in 1965 when Henriette Avram was developing the MARC format it would have been a logical choice for that data. The task that Avram faced was to create a machine-readable version of the data on the catalog card that would allow cards to be printed that looked exactly like the cards that were created prior to MARC. It was a classic document mark-up situation. Had that been the case our records could very well have evolved in a way that is different to what we have today, because XML would not have had the need to separate fixed field data from variable field data, and expansion of some data areas might have been easier. But saying that XML would have been a good format in 1965 does not mean that it would be a good format in 2011. &lt;br /&gt;&lt;br /&gt;For the future library data format, I can imagine that it will, at times, be conveyed over the internet in XML. If it can &lt;b&gt;ONLY&lt;/b&gt; be conveyed in XML we will have created a problem for ourselves. Our data should be independent of any particular serialization and be designed so that it is not necessary to have any particular combination or nesting of elements in order to make use of the data. Applications that use the data can of course combine and structure the elements however they wish, but for our data to be usable in a variety of applications we need to keep the "pre-coordination" of elements to a minimum.&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;* For example, there is an XML serialization (essentially a record  format) of RDF that is frequently used to exchange linked data, although  other serializations are also often available. It is used primarily  because there is a wide range of software tools available for making use  of XML data in applications, and there are many fewer tools available  for the more "native" RDF expressions such as triples or &lt;a href="http://www.w3.org/TeamSubmission/turtle/"&gt;turtle&lt;/a&gt;.  It encapsulates RDF data in a record format and I suspect that using  XML for this data will turn out to be a transitional phase as we move  from record-based data structures to graph-based ones.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-4720057814065742150?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/4720057814065742150/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=4720057814065742150' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/4720057814065742150'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/4720057814065742150'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/09/xml-and-library-data-future.html' title='XML and Library Data Future'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-SJqxHGa6Pj0/TmesHjsphYI/AAAAAAAAA80/U0L4LxZVBuo/s72-c/Slide2.jpg' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-793269418943356173</id><published>2011-08-26T18:05:00.000-07:00</published><updated>2011-08-26T18:05:01.316-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MARC'/><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><title type='text'>New bibliographic framework: there is a way</title><content type='html'>&lt;br /&gt;Since my last post undoubtedly left readers with the idea that I have my head in the clouds about the future of bibliographic metadata, I wish to present here some of the reasons why I think this can work. Many of you were probably left thinking: Yea, right. Get together a committee of a gazillion different folks and decide on a new record format that works for everyone. That, of course, would not be possible. But that's not the task at hand. The task at hand is actually about the opposite of that. Here are a few parameters.&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;#1 What we need to develop is NOT a record format&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The task ahead of us is to define an open set of data elements. Open, in this case, means usable and re-usable in a variety of metadata contexts. What wrapper (read: record format) you put around them does not change their meaning. Your chicken soup can be in a can, in a box, or in a bowl, but it is still chicken soup. That's the model we need for metadata. Content, not carrier. Meaning, not record format. Usable in many different situations.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;#2 Everyone doesn't have to agree to use the exact same data elements&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We only need to know the meaning of the data elements and what relationships exist between different data elements. For example, we need to know that my author and your composer are both persons and are both creators of the resource being described. That's enough for either of us to use the other's data under some circumstances. It isn't hard to find overlapping bits of meaning between different types of bibliographic metadata.&lt;br /&gt;&lt;br /&gt;Not all metadata elements will overlap between communities. The cartographic community will have some elements that the music library community will never use, and vice versa. That's fine. That's even good. Each specialist community can expand its metadata to the level of detail that it needs in its area. If the music library finds a need to catalog a map, they can "borrow" what they need from the cartographic folks.&lt;br /&gt;&lt;br /&gt;Where data elements are equivalent or are functionally similar, data definitions should include this information. Although defined differently, you can see that there are similarities among these data elements.&lt;br /&gt;&lt;blockquote&gt;pbcoreTitle =&amp;nbsp; a name given to the media item you are cataloging&lt;br /&gt;RDA:titleProper = A word, character, or group of words and/or characters that names a resource or a work contained in                         it.&lt;br /&gt;MARC:245 $a = title of a work&lt;br /&gt;dublincore:title = A name given to the resource&lt;/blockquote&gt;All of these are types of titles, and have a similar role in the descriptive cataloging of their respective communities: each names the target resource. These elements therefore can be considered members of a set that could be defined as: data elements that name the target resource. Having this relationship defined makes it possible to use this data in different contexts and even to bring these titles together into a unified display. This is no different to the way we create web pages with content from different sources like Flickr, YouTube, and a favorite music artist's web site, like the image here. &lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/--M3BRDCI6hI/Tlg6KFUqsII/AAAAAAAAA8k/nWX9Ma6cEfA/s1600/myFavorites.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://2.bp.blogspot.com/--M3BRDCI6hI/Tlg6KFUqsII/AAAAAAAAA8k/nWX9Ma6cEfA/s320/myFavorites.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;In this "My Favorites" case, the titles come from the Internet Movie Database, a library catalog display, the Billboard music site, and Facebook. It doesn't matter where they came from or what the data element was called at that site, what matters is that we know which part is the "name-of-the-thing" that we want to display here.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;#3 You con't have to create all new data elements for your resources if appropriate ones already exist&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When data elements are defined within the confines of a record, each community has to create an entire data element schema of their own, even if they would be coding some elements that are also used by other communities. Yet, there is no reason for different communities to each define a data element for an element like the ISBN because one will do. When data elements themselves are fully defined apart from any particular record format you can mix and match, borrowing from others as needed. This not only saves some time in the creation of metadata schemas but it also means that those data elements are 100% compatible across the metadata instances that use them. &lt;br /&gt;&lt;br /&gt;In addition, if there are elements that you need only rarely for less common materials in your environment, it may be more economical to borrow data elements created by specialist communities when they are needed, saving your community the effort of defining additional elements under your metadata name space. &lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;To do all of this, we need to agree on a few basic rules.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;1) We need to define our data elements in a machine-readable and machine-actionable way, preferably using a widely accepted standard.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This requires a data format for data elements that contains the minimum needed to make use of a defined data element. Generally, this minimum information is: &lt;br /&gt;&lt;ul&gt;&lt;li&gt;a name (for human readers)&lt;/li&gt;&lt;li&gt;an identifier (for machines)&lt;/li&gt;&lt;li&gt;a human-readable definition&lt;/li&gt;&lt;li&gt;both human and machine-readable definitions of relationships to other elements (e.g. "equivalent to" "narrower than" "opposite of")&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;b&gt;2) We must have the willingness and the right to make our decisions open and available online so others can re-use our metadata elements and/or create relationships to them.&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;b&gt;3) We also must have a willingness to hold discussions about areas of mutual interest with other metadata creators and with metadata users.&lt;/b&gt; That includes the people we think of today as our "users": writers, scholars, researchers, and social network participants. Open communication is the key. Each of use can teach, and each of us can learn from others. We can cooperate on the building of metadata without getting in each others' way. I'm optimistic about this.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-793269418943356173?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/793269418943356173/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=793269418943356173' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/793269418943356173'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/793269418943356173'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/08/new-bibliographic-framework-there-is.html' title='New bibliographic framework: there is a way'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/--M3BRDCI6hI/Tlg6KFUqsII/AAAAAAAAA8k/nWX9Ma6cEfA/s72-c/myFavorites.jpg' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-6681688509243938425</id><published>2011-08-25T20:53:00.000-07:00</published><updated>2011-08-26T08:10:55.295-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MARC'/><title type='text'>Bibliographic Framework Transition Initiative</title><content type='html'>The Internet began as a U.S.-sponsored technology initiative that went global while still under U.S. government control. The transition of the Internet to a world-wide communication facility is essentially complete, and few would argue that U.S. control of key aspects of the network is appropriate today. It is, however, hard for those once in control to give it up, and we see that in &lt;a href="http://en.wikipedia.org/wiki/ICANN"&gt;ICANN&lt;/a&gt;, the body charged with making decisions about the name and numbering system that is key to Internet functioning. &lt;a href="http://www.icann.org/"&gt;ICANN&lt;/a&gt; is under criticism from a number of quarters for continuing to be U.S.-centric in its decision-making. Letting go is hard, and being truly international is a huge challenge.&lt;br /&gt;&lt;br /&gt;I see a parallel here with Library of Congress and MARC. While there is no question that MARC was originally developed by the Library of Congress, and has been maintained by that body for over 40 years, it is equally true that the format is now used throughout the world and in ways never anticipated by its original developers. Yet LC retains a certain ownership of the format, in spite of its now global nature, and it is surely time for that control to pass to a more representative body.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Some Background&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;MARC began in the mid-1960's as an LC project at a time when the flow of bibliographic data was from LC to U.S. libraries in the form of card sets. MARC happened at a key point in time when some U.S. libraries were themselves thinking of making use of bibliographic data in a machine-readable form. It was the right idea at the right time.&lt;br /&gt;&lt;br /&gt;In the following years numerous libraries throughout the world adopted MARC or adapted MARC to their own needs. By 1977 there had been so much diverse development in this area that libraries used the organizing capabilities of &lt;a href="http://www.ifla.org/"&gt;IFLA&lt;/a&gt; to create a unified standard called &lt;a href="http://www.unimarc.net/brief-overview.html"&gt;UNIMARC&lt;/a&gt;. Other versions of the machine-readable format continued to be created, however.&lt;br /&gt;&lt;br /&gt;The tower of Babel that MARC spawned originally has now begun to consolidate around the latest version of the MARC format, MARC21. The reasons for this are multifold. First there are economic reasons: library vendor systems have been having to support this cacophony of data formats now for decades, which increases their costs and decreases their efficiency. Having more libraries on a single standard means that the vendor has fewer different code bases to develop and maintain. The second reason is the increased amount of sharing of metadata between libraries. It is much easier to exchange bibliographic data between institutions using the same data format.&lt;br /&gt;&lt;br /&gt;Today, MARC records, or at least MARC-like records, abound in the library sphere, and pass from one library system to another like packets over the Internet. OCLC has a database that consists of about 200 million records that are in MARC format, with data received from some 70,000 libraries, admittedly not all of which use MARC in their own systems. The Library of Congress has contributed approximately &lt;a href="http://www.loc.gov/cds/products/product.php?productID=19"&gt;12 million&lt;/a&gt; of those.&amp;nbsp; Within the U.S. the various &lt;a href="http://www.loc.gov/catdir/pcc/"&gt;cooperative cataloging programs&lt;/a&gt;&amp;nbsp; have distributed the effort of original cataloging among hundreds of institutions. Many national libraries freely exchange their data with their cohorts in other countries as a way to reduce cataloging costs for everyone. The directional flow of bibliographic data is no longer from LC to other libraries, but is a many-to-many web of data creation and exchange.&lt;br /&gt;&lt;br /&gt;Yet, much like ICANN and the Internet, LC remains as the controlling agency over the MARC standard. The &lt;a href="http://www.loc.gov/marc/overview.html"&gt;MARC Advisory Committee&lt;/a&gt;, which oversees changes to the format, has grown and has added members from Libraries and Archives Canada, The British Library, and Deutsche National Bibliothek. However, the standard is still primarily maintained by and issued by LC.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Bibliographic Framework Transition Initiative&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;LC recently announced the &lt;a href="http://www.loc.gov/marc/transition/"&gt;Bibliographic Framework Transition initiative&lt;/a&gt; to "determine a transition path for the MARC21 exchange format." &lt;br /&gt;&lt;blockquote&gt;"This work will be carried out in consultation with the format's formal partners -- Library and Archives Canada and the British Library -- and informal partners -- the Deutsche Nationalbibliothek and other national libraries, the agencies that provide library services and products, the many MARC user institutions, and the MARC advisory committees such as the MARBI committee of ALA, the Canadian Committee on MARC, and the BIC Bibliographic Standards Group in the UK."&lt;/blockquote&gt;In September we should see the issuance of their 18-month plan. &lt;br /&gt;&lt;br /&gt;Not included in LC's plan as announced are the publishers, whose data should feed into library systems and does feed into bibliographic systems like online bookstores. Archives and museums create metadata that could and should interact well with library data, and they should be included in this effort. Also not included are the academic users of bibliographic data, users who are so frustrated with library data that they have developed numerous standards of their own, such as &lt;a href="http://bibliontology.com/specification"&gt;BIBO&lt;/a&gt;, the Bibliographic Ontology, &lt;a href="http://www.bibkn.org/bibjson/index.html"&gt;BIBJson&lt;/a&gt;, a JSON format for bibliographic data, and &lt;a href="http://purl.org/spar/fabio"&gt;Fabio&lt;/a&gt;, the FRBR-Aligned Bibliographic Ontology. Nor are there representatives of online sites like Wikipedia and Google Books, which have an interest in using bibliographic data as well as a willingness to link back to libraries where that is possible. Media organizations, like the &lt;a href="http://www.bbc.co.uk/ontologies/"&gt;BBC&lt;/a&gt; and the U. S. &lt;a href="http://pbcore.org/"&gt;public broadcasting community&lt;/a&gt;, have developed metadata for their video and sound resources, many of which find their way into library collections. And I almost forgot: library systems vendors. Although there is some representation on the MARC Advisory Committee, they need to have a strong voice given their level of experience with library data and their knowledge of the costs and affordances.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Issues and Concerns&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;There is one group in particular that is missing from the LC project as announced: information technology (IT) professionals. In normal IT development the users do not design their own system. A small group of technical experts design the system structure, including the metadata schema, based on requirements derived from a study of the users' needs. This is exactly how the original MARC format was developed: LC hired a &lt;a href="http://en.wikipedia.org/wiki/Henriette_Avram"&gt;computer scientist&lt;/a&gt;&amp;nbsp; to study the library's needs and develop a data format for their cataloging. We were all extremely fortunate that LC hired someone who was attentive and brilliant. The format was developed in a short period of time, &lt;a href="http://openlibrary.org/works/OL6774532W/The_MARC_pilot_project"&gt;underwent testing and cost analysis&lt;/a&gt;, and was integrated with work flows. &lt;br /&gt;&lt;br /&gt;It is obvious to me that standards for bibliographic data exchange should not be designed by a single constituency, and should definitely not be led by a small number of institutions that have their own interests to defend. The consultation with other similar institutions is not enough to make this a truly open effort. While there may be some element of not wanting to give up control of this key standard, it also is not obvious to whom LC could turn to take on this task. LC is to be commended for committing to this effort, which will be huge and undoubtedly costly. But this solution is imperfect, at best, and at worst could result in a data standard that does not benefit the many users of bibliographic information.&lt;br /&gt;&lt;br /&gt;The next data carrier for libraries needs to be developed as a truly open effort. It should be led by a neutral organization (possibly ad hoc) that can bring together&amp;nbsp; the wide range of interested parties and make sure that all voices are heard. Technical development should be done by computer professionals with expertise in metadata design. The resulting system should be rigorous yet flexible enough to allow growth and specialization. Libraries would determine the content of their metadata, but ongoing technical oversight would prevent the introduction of implementation errors such as those that have plagued the MARC format as it has evolved. And all users of bibliographic data would have the capability of metadata exchange with libraries. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-6681688509243938425?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/6681688509243938425/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=6681688509243938425' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/6681688509243938425'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/6681688509243938425'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/08/bibliographic-framework-transition.html' title='Bibliographic Framework Transition Initiative'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-5087016776608947325</id><published>2011-08-19T08:51:00.001-07:00</published><updated>2011-08-19T08:59:40.899-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><title type='text'>Metadata Seminar at ASIST</title><content type='html'>I'm going to be giving a half-day &lt;a href="http://www.asis.org/asist2011/From_Metadata_to_a_Web_of_Data.html"&gt;seminar&lt;/a&gt; on October 12 in New Orleans in association with ASIST. This is something I have been wanting to do for a while. I feel like I've spent the past two years presenting Semantic Web 101 in 45-minute segments, and I really want to start moving on to 102, 103, etc. I'm hoping this seminar will fill that gap.&lt;br /&gt;&lt;br /&gt;The topics I will cover at that seminar are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Understanding data, data types, and data uses&lt;/li&gt;&lt;li&gt;Identifiers, URIs and http URIs&lt;/li&gt;&lt;li&gt;Statements and triples and their role in the 'web of data'&lt;/li&gt;&lt;li&gt;Defining properties and vocabularies that can be used effectively on the web&lt;/li&gt;&lt;li&gt;Brief introduction to semantic web standards&lt;/li&gt;&lt;/ul&gt;There will be hands-on exercises throughout the morning that give attendees a chance to learn by doing. I'm hoping that the exercises will also be fun. If you're going to ASIST and have any questions about the seminar, please contact me.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-5087016776608947325?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/5087016776608947325/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=5087016776608947325' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5087016776608947325'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5087016776608947325'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/08/metadata-seminar-at-asist.html' title='Metadata Seminar at ASIST'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-2814510131963149330</id><published>2011-08-14T12:10:00.000-07:00</published><updated>2011-08-16T21:09:03.615-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='women technology'/><title type='text'>Men, Women: Different</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-72Jor90Bj1k/TkmmDQzXEpI/AAAAAAAAA8U/TFbv2K5Dr0M/s1600/cutTheCord.jpg"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 312px; height: 400px;" src="http://4.bp.blogspot.com/-72Jor90Bj1k/TkmmDQzXEpI/AAAAAAAAA8U/TFbv2K5Dr0M/s400/cutTheCord.jpg" alt="" id="BLOGGER_PHOTO_ID_5641222583238464146" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The title of this post was a teaser headline on the cover of USA today -- no, I don't remember when, but the statement definitely struck me. Yes, we &lt;span style="font-style: italic;"&gt;are&lt;/span&gt; different. Our different points of view are so deep that it's often hard to explain &lt;span style="font-style: italic;"&gt;why&lt;/span&gt; something matters.&lt;br /&gt;&lt;br /&gt;This ad for the cordless mouse clearly made it all of the way through the company's management structure without raising an eyebrow, but many women I have shown this to have had a visceral reaction, since "cut the cord" brings up thoughts of childbirth, which makes this photo of butchers pretty gruesome.&lt;br /&gt;&lt;br /&gt;Around this same time (and I'm again talking about the mid 1990's) women began complaining about the title of the back page of PC Magazine: &lt;a href="http://www.pcmag.com/article2/0,2817,2164238,00.asp"&gt;Abort, Retry, Fail&lt;/a&gt;. It was a page of bloopers and idiotic error messages. You have to be of the older generation to remember what ARF was about, since it was a DOS error code. There are some examples &lt;a href="http://www.techrepublic.com/photos/weird-error-messages/333302?tag=siu-container;photopaging#photopaging"&gt;here&lt;/a&gt; in case you are either a) too young to remember, or b) wanting a nostalgia trip.&lt;br /&gt;&lt;br /&gt;In any case, some women objected to the use of Abort, Retry, Fail for a humor page because the term abort wasn't at all humorous to them. Men didn't understand this at all. They also seemed to think that the meaning "to end a process" was the main usage of the term, and that its association with a failed pregnancy was just a minor nit, hardly worth noticing. It obviously all depends on which meaning of &lt;span style="font-style: italic;"&gt;abort&lt;/span&gt; has had the greatest affect on your life. PC magazine did change the name to Backspace, and changed it back again to Abort, Retry, Fail in &lt;a href="http://www.pcmag.com/article2/0,2817,1953607,00.asp"&gt;2006&lt;/a&gt;. The magazine didn't survive much longer, but having nothing to do with its back page, I'm sure.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-nkmZ-Au70zo/TkmsoszeDSI/AAAAAAAAA8c/fJFTWiaUsZw/s1600/ezriderChameleon.jpg"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 400px; height: 336px;" src="http://4.bp.blogspot.com/-nkmZ-Au70zo/TkmsoszeDSI/AAAAAAAAA8c/fJFTWiaUsZw/s400/ezriderChameleon.jpg" alt="" id="BLOGGER_PHOTO_ID_5641229823480040738" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;This last image (unless I get ambitious later and scan some more) could almost have been used as a test for "Are you a man or a woman?" Chameleon was software for managing your Internet connection, and darned good software at that. We used it in my place of work for years. If you see a man on a motorcycle and nothing else, you just might be a man. If you see some high heels flying in the air and get an image of a woman having just been dumped on the road, you're either a woman or would make a great boyfriend. In my talks and writing I called this image: &lt;span style="font-style:italic;"&gt;Woman as roadkill on the information highway&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;It always amazes me how separate the realities can be for men and women, although there's a chance they are no more distant that those of rich and poor, abled and disabled, or any other human dichotomy you can come up with. I can say that having experienced the world of computing for nearly forty years as a woman, these differences in perception have a real effect on getting along and getting things done. One of my favorite statements is from Professor &lt;a href="http://people.mills.edu/spertus/"&gt;Ellen Spertus&lt;/a&gt; who teaches and encourages women in computer science, and who says: "You can be both rigorous and nurturing." My translation of this is: women's views count, too.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-2814510131963149330?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/2814510131963149330/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=2814510131963149330' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/2814510131963149330'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/2814510131963149330'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/08/men-women-different.html' title='Men, Women: Different'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-72Jor90Bj1k/TkmmDQzXEpI/AAAAAAAAA8U/TFbv2K5Dr0M/s72-c/cutTheCord.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-5328169546197030008</id><published>2011-08-14T11:16:00.000-07:00</published><updated>2011-08-16T10:22:20.840-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='women technology'/><title type='text'>Throbnet, 1995</title><content type='html'>&lt;a href="http://2.bp.blogspot.com/-tNXdys8Q7zY/TkgScpxJa1I/AAAAAAAAA8M/537ptWVQQfY/s1600/PCMagadsHustler.jpg" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 307px; height: 400px;" src="http://2.bp.blogspot.com/-tNXdys8Q7zY/TkgScpxJa1I/AAAAAAAAA8M/537ptWVQQfY/s400/PCMagadsHustler.jpg" alt="" id="BLOGGER_PHOTO_ID_5640778816739175250" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;One of the characteristics of PC Magazine in the mid-1990's was its adult classified section. This went on for many pages; many, many offensive pages. Remember, this was a magazine that many of us read in our professional capacity, since it was the main way to get information about new products and trends. PC Magazine was the primary source of hardware and software reviews; their special printer issue was &lt;span style="font-style: italic;"&gt;the&lt;/span&gt; place to go before buying a printer. But unfortunately, it also came with these pages.&lt;br /&gt;&lt;br /&gt;This is a fairly mild example. I don't remember my rationale but I probably didn't feel comfortable showing the raunchier ads to my audiences, so I used this one. There were more explicit examples like the ads for Throbnet (a name that is still used in online porn).&lt;br /&gt;&lt;br /&gt;But the real clincher for me was when I went to my first computer show. My memory has it that it was a MacWorld, but I can't be sure of that. It was in San Francisco, around 1995. Included among the exhibitors were some of the porn vendors who advertised in these pages. Their draw was that they had the actresses there in the booth. I distinctly remember the line of guys waiting to have their copy of "Anal ROM" signed. It was a very uncomfortable place for a woman working in the computer field.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-5328169546197030008?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/5328169546197030008/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=5328169546197030008' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5328169546197030008'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5328169546197030008'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/08/throbnet-1995.html' title='Throbnet, 1995'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-tNXdys8Q7zY/TkgScpxJa1I/AAAAAAAAA8M/537ptWVQQfY/s72-c/PCMagadsHustler.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-1289678254300698832</id><published>2011-08-14T10:41:00.000-07:00</published><updated>2011-08-15T16:50:25.765-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='women technology'/><title type='text'>No hairstyling tips, 1995</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-6jkHzcXSPek/TkgL1Rij0VI/AAAAAAAAA70/S3MYjJd9MJs/s1600/noHairstyling.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 380px; height: 400px;" src="http://3.bp.blogspot.com/-6jkHzcXSPek/TkgL1Rij0VI/AAAAAAAAA70/S3MYjJd9MJs/s400/noHairstyling.jpg" alt="" id="BLOGGER_PHOTO_ID_5640771543150874962" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;(The next few posts will be feminist in nature. If that type of thing annoys you, I suggest you skip them, and I'll be back to librarianship in a trice.)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There is a new generation of women dealing with the nature of computing culture. Fortunately, they have social media to help them cope. (&lt;a href="http://lwn.net/Articles/417952/"&gt;Example1&lt;/a&gt;, &lt;a href="http://blog.valerieaurora.org/2010/11/08/its-not-just-noirin/"&gt;Example2&lt;/a&gt;) Reading their posts reminded me that in the mid-90's I did talks about the portrayal of women in computer magazines, and that I might have some illustrations that were still usable. I have only a few since most of my examples ended up as black and white transparencies that aren't scan-able. But in the next few posts I'll offer what I do have, all from about 1995.&lt;br /&gt;&lt;br /&gt;The above image is of a postcard that I received in the mid-nineties from a bulletin board system (BBS) called "&lt;a href="http://en.wikipedia.org/wiki/Byte_Information_Exchange"&gt;BIX&lt;/a&gt;". BBSs were the only way to get online in those days, although by the mid-nineties most gave you a pass-through to the Internet. A BBS was a kind of mini-AOL: an online gathering and posting place that was a walled community. The first BBS I joined was &lt;a href="http://en.wikipedia.org/wiki/CompuServe"&gt;CompuServe&lt;/a&gt;, since that was the main place for technical information about PC hardware and software. (Note: there was a little or no product information on the Internet, which in the 1980's and early 90's was strictly limited to academic activities and research.)&lt;br /&gt;&lt;br /&gt;I must have gotten the BIX card because I subscribed to PC Magazine. The message, however, was far from inviting. Here's the back of the card:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-sGjln200M4s/TkgOCANcaNI/AAAAAAAAA78/NQPC_F04joM/s1600/bixBack.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 347px; height: 400px;" src="http://3.bp.blogspot.com/-sGjln200M4s/TkgOCANcaNI/AAAAAAAAA78/NQPC_F04joM/s400/bixBack.jpg" alt="" id="BLOGGER_PHOTO_ID_5640773960860461266" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The main text says:&lt;br /&gt;&lt;blockquote&gt;No garbage.&lt;br /&gt;No noise.&lt;br /&gt;No irrelevant clutter.&lt;/blockquote&gt;&lt;br /&gt;Which, as the card illustrates, obviously meant: no girls.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-9KeLRDDwqSU/TkgPzyKBgyI/AAAAAAAAA8E/JuZMJEDAjxA/s1600/wiredwomen.jpg"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 90px; height: 150px;" src="http://3.bp.blogspot.com/-9KeLRDDwqSU/TkgPzyKBgyI/AAAAAAAAA8E/JuZMJEDAjxA/s400/wiredwomen.jpg" alt="" id="BLOGGER_PHOTO_ID_5640775915593106210" border="0" /&gt;&lt;/a&gt;I have more examples of this "boy's club" atmosphere in 1996 in my article in &lt;a href="http://openlibrary.org/works/OL16049250W/Wired_women"&gt;Wired Women&lt;/a&gt;: &lt;a href="http://kcoyle.net/howhard.html"&gt;How hard can it be?&lt;/a&gt; (Available on my site.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-1289678254300698832?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/1289678254300698832/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=1289678254300698832' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1289678254300698832'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1289678254300698832'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/08/no-hairstyling-tips-1995.html' title='No hairstyling tips, 1995'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-6jkHzcXSPek/TkgL1Rij0VI/AAAAAAAAA70/S3MYjJd9MJs/s72-c/noHairstyling.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-7272810916070678733</id><published>2011-08-12T17:13:00.000-07:00</published><updated>2011-08-13T11:27:24.975-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='FRBR'/><title type='text'>Models of bibliographic data</title><content type='html'>There are two main models of bibliographic data that most of us are familiar with today. One is ISBD, which models bibliographic description. ISBD is a flat list of data areas:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Title and statement of responsibility&lt;/li&gt;&lt;li&gt;Edition&lt;/li&gt;&lt;li&gt;Material type&lt;/li&gt;&lt;li&gt;Publication, distribution, etc.&lt;/li&gt;&lt;li&gt;Physical description&lt;/li&gt;&lt;li&gt;Series&lt;/li&gt;&lt;li&gt;Notes&lt;/li&gt;&lt;li&gt;Identifier&lt;/li&gt;&lt;/ol&gt;In part, the MARC21 record implements ISBD description because AACR2, on which it is based, is compatible with ISBD but includes additional data such as headings (also known as "access points"). While I haven't seen a diagrammatic visualization of MARC, I believe it would be flat, much like ISBD.&lt;br /&gt;&lt;br /&gt;The other primary model is &lt;a href="http://www.ifla.org/VII/s13/frbr/frbr.htm"&gt;FRBR&lt;/a&gt;. There aren't yet many examples of FRBR-based data, although there are partial examples such as the Work views in &lt;a href="http://worldcat.org/"&gt;WorldCat&lt;/a&gt; and the Work and Personal author views in &lt;a href="http://openlibrary.org/"&gt;Open Library&lt;/a&gt;. The most fully FRBR-ized data appears to be in the VTLS Virtual database and their &lt;a href="http://vtls.com/services/rdasandbox"&gt;RDA sandbox&lt;/a&gt;, but I admit I haven't spent much time looking at this as it is a "pay fer" offering.&lt;br /&gt;&lt;br /&gt;The FRBR model isn't flat, but can be drawn as three groups of inter-related entities. The actual FRBR diagrams are too complex to fit in this blog post, but here's a simplified one that I have used in slide sets:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-ITUsNZHki2k/TkXOsqYmKfI/AAAAAAAAA7s/AoeKDQq8vAo/s1600/FRs.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://4.bp.blogspot.com/-ITUsNZHki2k/TkXOsqYmKfI/AAAAAAAAA7s/AoeKDQq8vAo/s400/FRs.jpg" alt="" id="BLOGGER_PHOTO_ID_5640141375038433778" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;There is a certain amount of &lt;span style="font-style: italic;"&gt;movement&lt;/span&gt; in FRBR compared to the flat models of ISBD and MARC. In particular, FRBR offers the possibility of creating paths through data by following the relationships of a single entity through the descriptions of different resources. It also allows something like a Person entity to be treated as a resource on its own and therefore to be the focus of attention for some data view.&lt;br /&gt;&lt;br /&gt;The British Library recently announced free and open versions of their &lt;a href="http://www.bl.uk/bibliographic/datafree.html"&gt;British National Bibliography&lt;/a&gt;, with records available in a linked data format. Their analysis of the BL data, done in collaboration with &lt;a href="http://www.talis.com/"&gt;Talis&lt;/a&gt;, a UK library systems company that is very active in linked data space, resulted in a &lt;a href="http://www.bl.uk/bibliographic/pdfs/datamodelv1_01.pdf"&gt;data model&lt;/a&gt; (PDF) that is unlike any we have seen before. What I give below isn't readable in its details, but I wanted to highlight the the key sections or groupings that are revealed in the analysis.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-kJlFZLlsoM4/TkXB5Frp7iI/AAAAAAAAA7k/2-_inTk07AA/s1600/blDiagram.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://1.bp.blogspot.com/-kJlFZLlsoM4/TkXB5Frp7iI/AAAAAAAAA7k/2-_inTk07AA/s400/blDiagram.jpg" alt="" id="BLOGGER_PHOTO_ID_5640127294873398818" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;There are a number of interesting aspects to this. To begin with, just by virtue of the diagramming of entities (which each get represented by an oval) you can see how much of the record is represented by named and identified entities rather than plain text. The plain text fields are on the bottom right of the diagram in the lavender boxes. Presented this way, they seem to have less importance than they do in traditional views. In sheer diagram real estate, subjects come out as the largest group, and authors appear to be more substantial than they seem in MARC models where they are reduced to short strings.&lt;br /&gt;&lt;br /&gt;I also find it very interesting that publication is represented as an event. This makes sense to me. In FRBR, publication isn't an action but a static description of when and where and who, and the various publications are treated as separate events unrelated to a history of how the Work resurfaces over time for new generations. I like the view that a work comes to us through a series of events, not separate and unrelated manifestations.&lt;br /&gt;&lt;br /&gt;I would like to suggest that we explore a variety of models for our data. I don't think we have to adopt one single model, but we should design our data such that it can be used in different views depending on the service being provided. I also think that we should explore these models before we put all of our eggs in the FRBR basket. We might learn something vital that should be taken into consideration for our future bibliographic data.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-7272810916070678733?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/7272810916070678733/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=7272810916070678733' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7272810916070678733'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7272810916070678733'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/08/models-of-bibliographic-data.html' title='Models of bibliographic data'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-ITUsNZHki2k/TkXOsqYmKfI/AAAAAAAAA7s/AoeKDQq8vAo/s72-c/FRs.jpg' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-9179791204530519976</id><published>2011-08-01T10:47:00.000-07:00</published><updated>2011-08-01T17:27:47.861-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Digital libraries'/><title type='text'>Suggestions for HathiTrust UI</title><content type='html'>Here are my concrete suggestions for improvements to the HathiTrust user interface. This is based on my own experience and should not be considered to be complete or universal. These are simply the things that would have made my experience better:&lt;br /&gt;&lt;br /&gt;On the home page, there should be two links:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;member login&lt;/li&gt;&lt;li&gt;guest login&lt;/li&gt;&lt;/ul&gt;By each there should be a link to help (one of those question mark circles, for example).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Member help&lt;/span&gt; will explain: that you must be someone associated with one of these institutions (link) with an institutional id. Members can: [whatever they can do - view everything, download all PD materials, create bibliographies...]&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Guest help&lt;/span&gt; will explain: that HT is a member-sponsored db. Guests can search and can view the full text some materials. A guest account allows you to create a persistent bibliography.&lt;br /&gt;&lt;br /&gt;On the page for a work, do NOT say: Public domain, Google-digitized. Instead, say what the user needs to know:&lt;br /&gt;Public domain; member-only download.&lt;br /&gt;Public domain; anyone can download.&lt;br /&gt;&lt;br /&gt;If you ask for a login at the time of download, ONLY ask for a member login since a guest login does not provide access at this point. The message ("member-only download") may be enough, but the login request could read: requires member login.&lt;br /&gt;&lt;br /&gt;This was as far as I got in HT, and I'm not going to be spending much more time there, since as a non-member I am actually served better on other sites. It's a superficial look from a first-time, non-member user.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-9179791204530519976?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/9179791204530519976/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=9179791204530519976' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/9179791204530519976'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/9179791204530519976'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/08/suggestions-for-hathitrust-ui.html' title='Suggestions for HathiTrust UI'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-1287799325964465614</id><published>2011-07-31T08:14:00.000-07:00</published><updated>2011-07-31T11:50:10.530-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Digital libraries'/><title type='text'>User-friendliness, a lesson</title><content type='html'>I was looking for Melvil Dewey's first published version of his classification system. My first instinct was to head to Google Book Search but I decided instead to use HathiTrust as a kind of gesture to non-commercial access. I did find what I was looking for, his 1876 pamphlet, opened it up in their reader and looked through it. I knew I'd want a copy, so I found the "download as PDF" link. That popped up a box telling me to "Login to determine whether you can download this book." The copyright is listed as "public domain in the United States." I don't see why I need to log in, and I downloaded it from GBS instead, without logging in but adding to the slime trail of my life that Google owns. The added step of logging in (to be started by creating yet another login on a system I will use only occasionally), for all that it may be no more or even less invasive of my privacy, is not user-friendly. It also didn't make sense to me at the time, and I was given nothing to convince me that logging in was beneficial ... to me.&lt;br /&gt;&lt;br /&gt;Yes, it's all about ME, me the user, me the person at the other end of the connection. I'm also not just any user, I am an advocate of libraries, a librarian, and I made the effort to go to HathiTrust -- a site that has not shown up for me in search engines.&lt;br /&gt;&lt;br /&gt;This seems to be such a basic lesson that I do not understand why libraries can't learn it. User-friendliness.&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;Ooof! It just gets worse. I decided to see what login is about. To get to login, you have to search, select a book, and click on login. On the book page, you may see that a book is "Public Domain" or it may say "Public Domain, Google-digitized". When you log in, you log in either as someone from a member institution or a guest. The guest log in form states:&lt;br /&gt;&lt;blockquote&gt;Does NOT provide access to full PDF downloads of public domain &amp;amp; open access items where not publicly available&lt;/blockquote&gt;However, it turns out that it DOES provide access to PD books (see comment by anonymous) if the book is not digitized by Google -- but that isn't what you've been told. "... not publicly available" isn't what you see on the book page, you see "Google-digitized." The page on &lt;a href="http://www.hathitrust.org/access_use"&gt;policies&lt;/a&gt; has two different categories, "Open Access" and "Open Access, Google-digitized." Nothing in the definitions of those categories mentions member and guest downloading.&lt;br /&gt;&lt;br /&gt;Basically, HathiTrust turns out to be a tiered system with member and non-member access. You don't encounter this until you try to download something that is PD but not "publicly available." Nothing on the home page mentions that this is a member-based service, therefore you don't know that as a non-member you will encounter walls.&lt;br /&gt;&lt;br /&gt;OK, it is resolved, that from now on I will always go first to the &lt;a href="http://openlibrary.org/"&gt;Open Library&lt;/a&gt;, a site where Open means what I think it should.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-1287799325964465614?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/1287799325964465614/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=1287799325964465614' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1287799325964465614'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1287799325964465614'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/07/user-friendliness-lesson.html' title='User-friendliness, a lesson'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-5169314010962055119</id><published>2011-07-25T15:11:00.000-07:00</published><updated>2011-08-01T09:10:13.440-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='RDA'/><title type='text'>RDA in XML - why not give it a shot?</title><content type='html'>&lt;blockquote&gt;&lt;a href="http://kcoyle.net/rda/RDAinXML.html"&gt;Example of RDA in XML&lt;/a&gt;  &lt;a href="http://kcoyle.net/rda/RDAinXML.html"&gt;&lt;/a&gt; / &lt;a href="http://kcoyle.net/rda/RDAinXML2.html"&gt;Example2 of RDA in XML&lt;/a&gt;&lt;/blockquote&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;There's a lot of talk about what we will do with RDA as data - what  format we will use, how it will look to users, etc etc etc. In fact, the  options are legion. The key point is that we don't have to decide on  just ONE WAY to carry and store RDA data elements, as long as we follow a  few rules.&lt;/p&gt; &lt;p&gt;As an experiment, I have coded a very simple bibliographic record  using two different possible ways to encode RDA in XML. For the XML data  elements I use the RDA elements from &lt;a href="http://metadataregistry.org/rdabrowse.htm"&gt; the Open Metadata Registry&lt;/a&gt;.  These elements are defined in OWL, and therefore are compatible with  semantic web applications. Their use in XML (and by that I mean non-RDF  XML) may be a bit questionable, yet at the same time XML may be a good  transition format from our current data to a ful RDF-based  implementation. I created two XML files: one in which I used &lt;a href="http://kcoyle.net/rda/simpleRDA.xml"&gt;text values&lt;/a&gt;, much as one would in MARC, and one in which I used &lt;a href="http://kcoyle.net/rda/simpleRDAwText.xml"&gt;URIs for values&lt;/a&gt; that have been encoded as vocabularies. Neither has a schema because creating a schema for RDA is a huge undertaking. If there is interest in this method, however, it might be worth... undertaking.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;The resulting files don't fit well in a blog post, so I created a &lt;a href="http://kcoyle.net/rda/RDAinXML.html"&gt;page&lt;/a&gt; with a side-by-side comparison. Please have a look. Feel free to comment or send me suggestions or corrections. or other ideas on how to do this better.&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-5169314010962055119?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/5169314010962055119/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=5169314010962055119' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5169314010962055119'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5169314010962055119'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/07/rda-in-xml-why-not-give-it-shot.html' title='RDA in XML - why not give it a shot?'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-8472530968380588276</id><published>2011-07-20T10:06:00.000-07:00</published><updated>2011-07-20T10:30:35.576-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='intellectual freedom'/><category scheme='http://www.blogger.com/atom/ns#' term='Digital libraries'/><title type='text'>Unequal Access</title><content type='html'>With the &lt;a href="http://www.nytimes.com/2011/07/20/us/20compute.html"&gt;recent indictment&lt;/a&gt; of an advocate for open information access who had set up a way to download about 4 million JSTOR articles, presumably with the intent to liberate them from their native closed access, we need to step back and look at how unequal information access is in this world. In major universities in the US, academics and students log on to their computers in their offices or at home and a whole world opens up to them. That's not some kind of accident. The prime goal of university libraries is to make good on "seek and ye shall find." The proof of the success of these libraries is that researchers are oblivious to the complexity of the system that serves them.  I would guess that many members of the US university community have no idea how their access to journals is managed and controlled. They don't see the contract negotiations with information providers, the continual development of software that makes single-point searching possible, the multi-faceted delivery systems that blend (or attempt to) digital and paper resources into a single stream. And they don't think about how different it would be if they weren't members of that privileged community.&lt;br /&gt;&lt;br /&gt;Contrast that to the access available to a member of the US public who is not part of this academic sector. Like myself. Like the majority of people in this country. There is no access to JSTOR. No openURL server gives me multiple access options. The local public library does have some electronic materials, but these are much less extensive (and less expensive) than the ones in academic libraries. I may have to wait weeks to get a book that isn't in my local library's collection, if I can get it at all. I am often in the embarrassing position of not being able to access articles that I would like to read or quote from, including ones that I myself have authored.&lt;br /&gt;&lt;br /&gt;In spite of this, I know that my information access, as a mere member of the US public, is far superior to that found in other countries; countries where serious researchers struggle to participate in research because they do not have the access that many academics here take for granted. Two anecdotes:&lt;br /&gt;&lt;br /&gt;-- When I lived in Italy in the 1970's my friends were mainly college students or recent graduates. University education was free, but it was generally accepted that the only way to complete ones final thesis was to be able to afford to go abroad for two or three months. The purpose of this trip was to spend time in a country with a good library system, since libraries in Italy were limited. This was not just for students studying foreign literatures, but even those studying sciences, history, and art. These kids were essentially "library tourists." I don't know if this continues today.&lt;br /&gt;&lt;br /&gt;-- During the time I worked at UC I was in a conversation with someone involved in the licensing of databases. For some reason we got talking about enforcement of contractual clauses having to do with excessive downloading and/or piracy. This person told me that all access to one of the UC campuses had been cut off recently for a few days because it was discovered that someone was systematically downloading entire journal runs. When they found the student it turned out that it was a foreign graduate student who would soon be returning home. Knowing that leaving the UC system would mean losing access to the journals he would need to continue his research, he was making himself a copy to take home.&lt;br /&gt;&lt;br /&gt;It occurs to me as I write this that the "Digital Public Library of America" could create an information revolution in this country by upgrading the access of the general public to that of an academic or student in a large college or university, without ever digitizing a single page.  What makes Stanford "Stanford" or Harvard "Harvard" is not just its famed faculty but the full range of information that is shared by that community. Everything they do, every bit of research, every new idea, is facilitated by the library and its services.&lt;br /&gt;&lt;br /&gt;The information access gap between a university researcher and the average person on the street is immense. We have an information elite that, like most elites, considers its position to be earned, just, and reasonable. Few in academia worry that the access they have isn't widely shared. If they did, they would hopefully decide that something should be done.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-8472530968380588276?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/8472530968380588276/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=8472530968380588276' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8472530968380588276'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8472530968380588276'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/07/unequal-access.html' title='Unequal Access'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-6631433362962062435</id><published>2011-06-18T08:21:00.000-07:00</published><updated>2011-06-18T09:16:32.828-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MARC'/><category scheme='http://www.blogger.com/atom/ns#' term='linked data'/><title type='text'>Opportunity knocks</title><content type='html'>There will soon be a call for reviews of the &lt;a href="http://www.w3.org/2005/Incubator/lld/wiki/DraftReportWithTransclusion"&gt;draft report&lt;/a&gt; by the W3C Incubator Group on Library Linked Data. As a member of that group I have had a hand in writing that draft, and I can tell you that it has been a struggle. Now we seriously need to hear from you, not the least because the group is not fully representative of the library world; in fact, it leans heavily toward techy-ness and large libraries and services. We need to hear from a wide range of libraries and librarians: public, small, medium, special, management, people who worry about budgets, people who have face time with users. We also need to hear from the library vendor community, since little can happen with library data that will not involve that community. (Note: a site is being set up to take comments, and I am hoping it will be possible to post anonymously or at least pseudonymously, for those who cannot appear to be speaking for their employer.)&lt;br /&gt;&lt;br /&gt;In thinking about the possibility of moving to a new approach to bibliographic data in libraries, I created this diagram (which will not be in the report, it was just my thinking) that to me represents a kind of needs assessment. This pyramid is not just related to linked data but to any data format that we might adopt to take the place of the card catalog mark-up that we use today.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-jh2a7tuCJE8/TfzIqeoU2GI/AAAAAAAAA40/B_C6Sk8gAWQ/s1600/issuePyramid.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://4.bp.blogspot.com/-jh2a7tuCJE8/TfzIqeoU2GI/AAAAAAAAA40/B_C6Sk8gAWQ/s400/issuePyramid.jpg" alt="" id="BLOGGER_PHOTO_ID_5619587067153799266" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;We could use this to address the recent LC announcement on replacing MARC. Here's how I see that analysis, starting with the bottom of the pyramid:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Motivation:&lt;/span&gt; Our current data model lacks the flexibility that we need, and is keeping us from taking advantage of some modern technologies that could help us provide better user service. Libraries are becoming less and less visible as information providers, in part because our data does not play well on the web, and it is difficult for us to make use of web content.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Leadership:&lt;/span&gt; Creating a new model is going to take some serious coordination among all of the parties. Who should/could provide that leadership, and how can we fund this effort? Although LC has announced its intention to collaborate, for various reasons a more neutral organization might be desired, one that is truly global in scope. Yet who can both lead the conversion effort &lt;span style="font-style: italic;"&gt;and &lt;/span&gt;be available for the future to provide stability for the long term maintenance that a library data carrier will require? And how can we be collaborative without being glacially slow?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Skills:&lt;/span&gt; Many of us went through library school before the term "metadata" was in common usage. We learned to follow the cataloging rules, but not to understand the basic principles of data modeling and creation. This is one of the reasons why it is hard for us to change: we are one-trick ponies in the metadata world. The profession needs new skills, and it's not enough for only a few to acquire them: we all need to understand the world we are moving into&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Means:&lt;/span&gt; This is the really hard one: how do we get the time and funding to make this much-needed change? Both will need to be justified with some clear examples of what we gain by this effort. I favor some demonstration projects, if we can find a way to create them.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Opportunity:&lt;/span&gt; The opportunity is here now. We could have made this change any time over the past decade or two while cataloging with AACR2, but RDA definitely gives us that golden moment when &lt;span style="font-style: italic;"&gt;not changing&lt;/span&gt; no longer makes sense.&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-6631433362962062435?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/6631433362962062435/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=6631433362962062435' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/6631433362962062435'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/6631433362962062435'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/06/opportunity-knocks.html' title='Opportunity knocks'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-jh2a7tuCJE8/TfzIqeoU2GI/AAAAAAAAA40/B_C6Sk8gAWQ/s72-c/issuePyramid.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-1583696357768402647</id><published>2011-05-31T07:30:00.000-07:00</published><updated>2011-06-01T07:24:00.863-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Digital libraries'/><title type='text'>All the ____ in the world</title><content type='html'>&lt;div style="text-align: center;"&gt;"All the ___ in the world"&lt;br /&gt;"Every ____ ever created"&lt;br /&gt;"World's largest ____ "&lt;br /&gt;"Repository of all knowledge in ____"&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;There's something compelling about completeness, about the idea that you could gather ALL of something, anything, together into a single system or database or even, as in the ancient library of Alexandria, physical space. Perhaps it's because we want the satisfaction of being finished. Perhaps it's something primitive in our brain stems that has the evolutionary advantage of keeping us from declaring victory with a job half done. (Well, at least some of us.) To be sure, setting your goal to gather all of something means you don't have to make awkward choices about what to gather/keep and what to discard. The indiscriminate &lt;span style="font-style: italic;"&gt;everything&lt;/span&gt; may be the easier target.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Worldcat has 229,322,364 bibliographic records.&lt;br /&gt;OpenLibrary has over 20 million records and 1.7 million fulltext books.&lt;br /&gt;LibraryThing has records for 6,102,788 unique works.&lt;/blockquote&gt;&lt;blockquote&gt;If you read one book a week for 60 years, you will have read 3,120 books. If you read one book a day for that same length of time, you will have read 21,360 (not counting leap years). &lt;/blockquote&gt;The trick, obviously, is to discover the set of books, articles, etc., that will enhance your brief time on this planet. To do this, we search in these large databases. By having such large databases to search we are increasing our odds of finding everything in the world about our topic. Of course, we probably do not want everything in the world about our topic, we want the right books (articles, etc.) for us.&lt;br /&gt;&lt;br /&gt;There are some down sides to this &lt;span style="font-style: italic;"&gt;everything&lt;/span&gt; approach, not surprisingly. The first is that any search in a large database retrieves an unwieldy, if not unusable, large set of stuff. For this reason, many user interfaces give us ways to reduce the set using additional searches, often in the form of facets. Yet even then one is likely to be overwhelmed.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Everything&lt;/span&gt; includes key works and the odd bits and pieces of dubious repute and utility. Retrieving everything places a great burden on the user to sort out the wheat from the chaff. This is especially difficult when you are investigating an area where you are not an expert. Ranking may highlight the most popular items but those may not be what you are seeking. In fact, they may be items that you have retrieved before, even multiple times, because every search begins with a tabula rasa.&lt;br /&gt;&lt;br /&gt;Another down side is that although computers are more powerful than ever and storage space is inexpensive, these large databases tend to collapse under the demands of just a few complex queries. Because of this, what users can and cannot do is controlled by the user interface which serves to protect the system by steering users to safe functions. Users often can create their own lists, can add tags, can make changes to the underlying data, but they cannot reorder the retrieved set by an arbitrary data element, they can't compare their retrieved set against items they have already saved or seen previously, they can't run analyses like topic maps on their retrieved set to better understand what is there.&lt;br /&gt;&lt;br /&gt;I conclude, therefore, that what would be useful would be to treat these large databases as warehouses or raw materials, and provide software that allow users to select from these to create a personal database. This personal database software would resemble, ta da!, &lt;a href="http://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/3881/4/"&gt;Vannevar Bush's Memex&lt;/a&gt;, a combination database and information use system. I can see it having components that are analogous to some systems we already have:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;automated download of data from the big warehouses, like &lt;a href="http://librarything.com/"&gt;LibraryThing&lt;/a&gt;&lt;/li&gt;&lt;li&gt;an easy visual way to do interesting queries, like &lt;a href="http://pipes.yahoo.com/"&gt;Yahoo! Pipes&lt;/a&gt;&lt;/li&gt;&lt;li&gt;the ability to ask questions, like &lt;a href="http://www.wolframalpha.com/"&gt;Wolfram Alpha&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;The personal database would be able to interact with the world of raw material and with other databases. I can imagine functions like: "get me all of the books and articles from this item's bibliography." Or: "compare my library to The Definitive Bibliography of [some topic]." Or: "Check my library and tell me if there are new editions to any of my books." In other words, it's not enough to search and get; in fact, searching and getting should be the least of what we are able to do.&lt;br /&gt;&lt;br /&gt;There are a whole lot of resource management functions that a student or researcher could find useful because within a selected set there is still much to discover. These smaller, personal databases should also be able to interact with each other, doing comparisons and cross-database queries. We should be able to make notes and create relationships and share them (a Memex feature). The personal database should be associated with person, not a particular library or institution, and must work across institutions and services. I can't imagine what it must be like today to graduate and to lose not only the privileged access that members of institutions enjoy but also the entire personal space that one has created while attached to that institution.&lt;br /&gt;&lt;br /&gt;In short, it's not about the STUFF, it's about the services. It doesn't matter how much STUFF you have it's what people can DO with it. Verb, not noun. Quality not quantity.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-1583696357768402647?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/1583696357768402647/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=1583696357768402647' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1583696357768402647'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1583696357768402647'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/05/all-in-world.html' title='All the ____ in the world'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-2977464497698070994</id><published>2011-05-24T20:47:00.000-07:00</published><updated>2011-05-24T20:56:57.039-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MARC'/><category scheme='http://www.blogger.com/atom/ns#' term='linked data'/><title type='text'>From MARC to Principled Metadata</title><content type='html'>&lt;div&gt;Library of Congress has &lt;a href="http://www.loc.gov/marc/transition/"&gt;announced&lt;/a&gt; its intention to "review the bibliographic framework to better accommodate future needs." The translation of this into plain English is that they are (finally!) thinking about replacing the MARC format with something more modern. This is obviously something that desperately needs to be done. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I want to encourage LC and the entire library community to build its future bibliographic data on solid principles. Among these principles would be:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;Use data, not text.&lt;/b&gt; Wherever possible, the stuff of bibliographic description should be computable data, not human-interpretable text. Any part of your metadata that cannot be used in machine algorithms is of limited utility in user services. &lt;/li&gt;&lt;li&gt;&lt;b&gt;Give your things identifiers, not language tags.&lt;/b&gt; Identification allows you to share meaning without language barriers. Anything that has been identified can be displayed in language terms to users in any language of your (or the user's) choice. &lt;/li&gt;&lt;li&gt;&lt;b&gt;Adopt mainstream metadata standards&lt;/b&gt;. This is not only for the data formats but also in terms of the data itself. If other metadata creators are using a particular standard language list or geographic names, use those same terms. If there are metadata elements for common things like colors or sizes or places or [whatever], use those. Work with international communities to extend metadata if necessary, but do not create library-specific versions.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There is much more to be said, and fortunately a great deal of it is being included in the report of the &lt;a href="http://www.w3.org/2005/Incubator/lld/wiki/Main_Page"&gt;W3C Incubator Group on Library Linked Data.&lt;/a&gt; Although still in draft form you can see the current state of that group's &lt;a href="http://www.w3.org/2005/Incubator/lld/wiki/Draft_recommendations_page"&gt;recommendations&lt;/a&gt;, many of which address the transition that LC appears to be about to embark on. A version of the report for comments will be available later this summer. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The existence of this W3C group, however, is the proof of something very important that the Library of Congress must embrace: that bibliographic data is not solely of interest to libraries, and the future of library data should not be created as a library standard but as an information standard. This means that its development must include collaboration with the broader information community, and that collaboration will only be successful if libraries are willing to compromise in order to be part of the greater info-sphere. That's the biggest challenge we face. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-2977464497698070994?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/2977464497698070994/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=2977464497698070994' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/2977464497698070994'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/2977464497698070994'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/05/from-marc-to-principled-metadata.html' title='From MARC to Principled Metadata'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-2478277975995726479</id><published>2011-05-13T12:53:00.000-07:00</published><updated>2011-05-13T13:22:21.474-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Digital libraries'/><category scheme='http://www.blogger.com/atom/ns#' term='search'/><title type='text'>Dystopias</title><content type='html'>In the 1990's I wrote often about information dystopias. In &lt;a href="http://kcoyle.net/njw.html"&gt;1994&lt;/a&gt; I said:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;It's clear to me that the information highway isn't much about information. It's about trying to find a new basis for our economy. I'm pretty sure I'm not going to like the way information is treated in that economy. We know what kind of information sells, and what doesn't.&lt;/blockquote&gt;&lt;br /&gt;In &lt;a href="http://kcoyle.net/ethics.html"&gt;1995 &lt;/a&gt;I painted a surprisingly accurate picture of 2015 that included:&lt;br /&gt;&lt;blockquote&gt;Big boys, like Disney and Time/Warner/Turner put out snippets of their films and have enticed viewers to upgrade their connection to digital movie quality. News programs have truly found their place on the Net, offering up-to-the second views of events happening all over the world, perfectly selected for your interests....Online shopping allows 3-D views of products and virtual walk-throughs of vacation paradises.&lt;/blockquote&gt;&lt;br /&gt;If there were a stock market for cynical investments, I'd be sitting pretty right now. But wait... there's more! Because there's always a future, and therefore more dystopia to predict.&lt;br /&gt;&lt;br /&gt;My latest is concern is about searching and finding. And of course that means that I am concerned about Google, but this is in a new context. I have spent the last five years trying to convince libraries that we need to be &lt;span style="font-style: italic;"&gt;of the web&lt;/span&gt; -- not only on the web but truly web resources. I strongly believe this is the only possible way to keep libraries relevant to new generations of information seekers. This has been interpreted by many as a digitization project that will result in getting the stuff of libraries (books mainly) onto the web, and getting the metadata about that stuff out of library catalogs and onto the web. &lt;a href="http://www.hathitrust.org"&gt;Hathitrust&lt;/a&gt;, for example, is a massive undertaking that will store and preserve huge amounts of digitized books. The &lt;a href="http://cyber.law.harvard.edu/dpla/Main_Page"&gt;Digital Public Library of America&lt;/a&gt; (DPLA), just in its early planning stages today, wants to make all books available to everyone for "free."&lt;br /&gt;&lt;br /&gt;All of these are highly commendable projects, but there is a reality that we don't seem to be have embraced, and that is that searching and finding are as important to the information seeking process as the actual underlying materials. As we can easily see with Google, the search engine is the gate-keeper to content. If content cannot be found then it does not exist. And determining what content will be accessed is real power in the information cloud. [cf. &lt;a href="http://openlibrary.org/authors/OL1601697A/Siva_Vaidhyanathan"&gt;Siva Vaidhyanathan&lt;/a&gt;, &lt;a href="http://openlibrary.org/books/OL24647202M/Googlization_of_everything"&gt;Googlization of Everything&lt;/a&gt;.]&lt;br /&gt;&lt;br /&gt;There is a danger that when this mass of library materials becomes &lt;span style="font-style: italic;"&gt;of the web&lt;/span&gt; that we could entirely lose control of its discovery. But it isn't just a question of library materials, this is true for the entire linked data cloud: who will create the search engine that makes all of that data findable? With its purchase of &lt;a href="http://freebase.com"&gt;freebase.com&lt;/a&gt;, it is clear that Google has at least an eye on LD space. And of course Google has the money, the servers, the technology to do this. We know, however, from our experience with the current Google search engine that the application of Google's values to search produces a particular result. We also know that Google's main business model is based on making a connection between searchers and advertisers. [cf. &lt;a href="http://openlibrary.org/authors/OL239570A/Ken_Auletta"&gt;Ken Auletta&lt;/a&gt;, &lt;a href="http://openlibrary.org/works/OL1990196W/Googled"&gt;Googled&lt;/a&gt;] .&lt;br /&gt;&lt;br /&gt;It's not enough for libraries to gather, store and preserve huge masses of information resources. We have to be actively engaged with users and potential users, and that engagement includes providing ways for them to find and to use the resources libraries have. We must provide the entry point that brings users to information materials without that access being mediated through a commercial revenue model. So for every HathiTrust or DPLA that focuses on the resources we need a related project -- &lt;span style="font-style: italic;"&gt;equally well-funded&lt;/span&gt; -- that focuses on users and access. Not just creating a traditional library-type catalog but providing a whole host of services that will help uses find and explore the digital library. This interface needs to be part search engine, part individual work space, and part social networking. Users should be able to do their research, store their personal library (getting into Memex territory here), share their work with others, engage in conversations, and perhaps even manage complex research projects. It could be like a combination of &lt;a href="http://zotero.org"&gt;Zotero&lt;/a&gt;, &lt;a href="http://vivo.cornell.edu/"&gt;VIVO&lt;/a&gt;, &lt;a href="http://www.zoho.com"&gt;Zoho&lt;/a&gt;, &lt;a href="http://pipes.yahoo.com/"&gt;Yahoo pipes&lt;/a&gt;, &lt;a href="http://dabble.com"&gt;Dabble&lt;/a&gt;, and MIT's &lt;a href="http://ocw.mit.edu/"&gt;OpenCourseWare&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Really, if we don't do this, the future of libraries and research will be decided by Google. There, I said it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-2478277975995726479?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/2478277975995726479/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=2478277975995726479' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/2478277975995726479'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/2478277975995726479'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/05/dystopias.html' title='Dystopias'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-8177681568696139567</id><published>2011-04-24T07:48:00.000-07:00</published><updated>2011-04-30T09:59:36.296-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><category scheme='http://www.blogger.com/atom/ns#' term='RDA'/><category scheme='http://www.blogger.com/atom/ns#' term='RDF'/><title type='text'>Visualizing linked data</title><content type='html'>Chris Oliver, Diane Hillmann and I will be reprising (and updating) our three-part webinar on &lt;a href="http://http//www.alastore.ala.org/detail.aspx?ID=3125"&gt;RDA and the future of library metadata&lt;/a&gt; starting on May 11. As before, Chris will cover the principles behind RDA and why RDA is different from other cataloging codes; I will talk about the Semantic Web and why it is important for libraries to be part of the web of data (May 18); Diane will show how the &lt;a href="http://metadataregistry.org/rdabrowse.htm"&gt;Open Metadata Registry&lt;/a&gt; makes possible a Semantic Web-compatible version of RDA (May 25).&lt;br /&gt; One of the questions I always get when talking about the Semantic Web is "What does it look like?" This is kind of like asking what electricity looks like: it doesn't so much look like anything, as it makes certain things possible. But I fully understand that people need to see something for this all to make sense, so when the webinar technology allows it I have started showing some web pages. When it doesn't, I send people to &lt;a href="http://kcoyle.net/presentations/links.html"&gt;links&lt;/a&gt; they can explore on their own. Since some of you may have this same question, here are a few illustrations using two sites that can present authors in a Semantic Web form.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-Mhzvm5cgpWE/TbSnJ0Y9GDI/AAAAAAAAA4M/M_3v4ycnjgc/s1600/b_cartland_ol.jpg"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 400px; height: 316px;" src="http://3.bp.blogspot.com/-Mhzvm5cgpWE/TbSnJ0Y9GDI/AAAAAAAAA4M/M_3v4ycnjgc/s400/b_cartland_ol.jpg" alt="" id="BLOGGER_PHOTO_ID_5599284023852341298" border="0"&gt;&lt;/a&gt; When you do a search for an author on the Open Library you retrieve a page for the author. This is a page for the author &lt;a href="http://openlibrary.org/authors/OL22022A/Barbara_Cartland"&gt;Barbara Cartland&lt;/a&gt;. The page has not been hand-coded by a human but is derived "on the fly" from the information in the Open Library database.&lt;br /&gt;&lt;br /&gt;That same information is available in a &lt;a href="http://openlibrary.org/authors/OL22022A.rdf"&gt;semantic web format&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Resource_Description_Framework"&gt;RDF&lt;/a&gt; in XML. (Note: it is common to code Semantic Web data in XML, but that's not the only possible data format. There is nothing inherent in the Semantic Web that would make it XML-like, it's just a convenience.) This is not intended to be human friendly -- it is code to be used by programs. You should notice that it makes use of identifiers that look like URLs:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&amp;lt;foaf:person about="http://openlibrary.org/authors/OL22022A"&amp;gt;&amp;lt;/foaf:person&amp;gt;&lt;/blockquote&gt;&lt;blockquote&gt;&lt;/blockquote&gt;The above establishes the primary identifier for all of the information that follows in the XML.&lt;br /&gt;&lt;br /&gt;You will also see that, like other applications using XML, it allows you to mix data elements from different "namespaces." The Open Library RDF uses a mix of elements from &lt;a href="http://dublincore.org/documents/dcmi-terms/"&gt;Dublin Core&lt;/a&gt;, &lt;a href="http://xmlns.com/foaf/spec/"&gt;Friend-of-a-Friend&lt;/a&gt; (FOAF), the &lt;a href="http://bibotools.googlecode.com/svn/bibo-ontology/trunk/doc/index.html"&gt;Bibliographic Ontology&lt;/a&gt;, and &lt;a href="http://metadataregistry.org/rdabrowse.htm"&gt;RDA Vocabularies&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Another database that provides its data in RDF is the &lt;a href="http://viaf.org/"&gt;Virtual International Authority File&lt;/a&gt;, VIAF. VIAF combines the name authority data from about twenty national authority files, making it possible to translate from different name display forms when exchanging data.  Here is part of the VIAF display for &lt;a href="http://viaf.org/viaf/64003092/#Cartland,_Barbara,_1902-2000"&gt;Barbara Cartland&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-MSyAkMyKpVo/TbS3dGAxoMI/AAAAAAAAA4U/56NLBs2Dad0/s1600/viafCartland.jpg"&gt;&lt;img style="float: right; margin: 0pt 0pt 10px 10px; cursor: pointer; width: 400px; height: 319px;" src="http://4.bp.blogspot.com/-MSyAkMyKpVo/TbS3dGAxoMI/AAAAAAAAA4U/56NLBs2Dad0/s400/viafCartland.jpg" alt="" id="BLOGGER_PHOTO_ID_5599301947186323650" border="0"&gt;&lt;/a&gt;&lt;br /&gt;You can retrieve or export the metadata for this author in various formats including &lt;a href="http://viaf.org/viaf/64003092/marc21.xml"&gt;MARC&lt;/a&gt; and &lt;a href="http://viaf.org/viaf/64003092/rdf.xml"&gt;RDF/XML&lt;/a&gt;. Once again you will see that the RDF form of the data makes use of FOAF, a standard called &lt;a href="http://en.wikipedia.org/wiki/SKOS"&gt;"Simple Knowledge Organization System"&lt;/a&gt; or SKOS, and also &lt;a href="http://metadataregistry.org/rdabrowse.htm"&gt;RDA vocabularies&lt;/a&gt; for the FRBR Group2 entities from the Open Metadata Registry.&lt;br /&gt;&lt;br /&gt;You can look at more examples on my &lt;a href="http://kcoyle.net/presentations/links.html"&gt;links&lt;/a&gt; page, but I hope that this takes some of the mystery out of Semantic Web data, or at least makes the mystery a &lt;span style="font-style:italic;"&gt;known&lt;/span&gt; rather than &lt;span style="font-style:italic;"&gt;unknown&lt;/span&gt; puzzler.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-8177681568696139567?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/8177681568696139567/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=8177681568696139567' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8177681568696139567'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8177681568696139567'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/04/visualizing-linked-data.html' title='Visualizing linked data'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-Mhzvm5cgpWE/TbSnJ0Y9GDI/AAAAAAAAA4M/M_3v4ycnjgc/s72-c/b_cartland_ol.jpg' height='72' width='72'/><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-8688446116907327218</id><published>2011-04-18T07:32:00.000-07:00</published><updated>2011-04-23T15:33:58.759-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='FRBR'/><title type='text'>FRBR as cake</title><content type='html'>I keep trying to explain what bothers me about FRBR, and in particular about WEMI. I've recently thought about it it with this image of a cake. I know this is a flawed analogy, but it works for me on some level. It goes like this:&lt;br /&gt;&lt;br /&gt;When you make a cake, you have a number of ingredients:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-_yZfl_zM3F8/TaxMSGi_SLI/AAAAAAAAA30/-QCuzG-hJKU/s1600/frbrAsCake.001.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: left; cursor: pointer; width: 346px; height: 258px;" src="http://2.bp.blogspot.com/-_yZfl_zM3F8/TaxMSGi_SLI/AAAAAAAAA30/-QCuzG-hJKU/s400/frbrAsCake.001.jpg" alt="" id="BLOGGER_PHOTO_ID_5596932310793406642" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;When you mix them together to make a cake you don't get this:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-h9OMKSaiaXU/TaxMo0gK4PI/AAAAAAAAA38/mHb-0TLBJHQ/s1600/frbrAsCake.002.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 300px;" src="http://4.bp.blogspot.com/-h9OMKSaiaXU/TaxMo0gK4PI/AAAAAAAAA38/mHb-0TLBJHQ/s400/frbrAsCake.002.jpg" alt="" id="BLOGGER_PHOTO_ID_5596932701086736626" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;You get this:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-NFWAUdXhSmU/TaxM1phuAtI/AAAAAAAAA4E/Z-Na9NF3yfo/s1600/frbrAsCake.003.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 300px;" src="http://3.bp.blogspot.com/-NFWAUdXhSmU/TaxM1phuAtI/AAAAAAAAA4E/Z-Na9NF3yfo/s400/frbrAsCake.003.jpg" alt="" id="BLOGGER_PHOTO_ID_5596932921478742738" border="0" /&gt;&lt;/a&gt;My point here, in case it isn't clear, is that the purpose of creating a bibliographic description using a number of different entities is to... well, to create a bibliographic description; something that as a whole has meaning. You can create it from individual "ingredients," like information about a Work and an Expression, but those do not need to remain separate entities in your final product; instead, that information can become part of your whole.&lt;br /&gt;&lt;br /&gt;I know that people like the idea of a distributed bibliographic description with a single Work entity that links to many Expressions that then link to many Manifestations, etc., and that could be the underlying structure of ones data store. But just because there are Work entities (eggs) doesn't mean that our metadata keeps the Work entity "intact." In fact, our systems may use only a portion of the Work entity, and may use bits of it at different times in different contexts.&lt;br /&gt;&lt;br /&gt;Leaving poorly-drawn analogies aside, creating our data as sets (or "graphs") of triples should give us maximum flexibility. One thing this means is that even a partial description is valid. Thus a full library catalog record and an abbreviated citation are both valid representations of a resource. They should connect to the larger linked data information space through any of the statements they contain, regardless of the structure of their graphs. And it is my guess that many bibliographic descriptions will be simple graphs with a single RDF subject (that means a single bibliographic resource). The highly structured bibliographic universe of FRBR will be a minority case, and the FRBR entities, like our eggs and sugar and flour, will be useful ingredients that disappear into actual creations.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-8688446116907327218?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/8688446116907327218/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=8688446116907327218' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8688446116907327218'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8688446116907327218'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/04/frbr-as-cake.html' title='FRBR as cake'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-_yZfl_zM3F8/TaxMSGi_SLI/AAAAAAAAA30/-QCuzG-hJKU/s72-c/frbrAsCake.001.jpg' height='72' width='72'/><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-9121708334311944533</id><published>2011-03-24T06:50:00.000-07:00</published><updated>2011-04-16T08:14:27.098-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='open access'/><title type='text'>Open Data II</title><content type='html'>In this post I want to talk about some of the Open Government Data (OGD) projects taking place around the world.&lt;br /&gt;&lt;br /&gt;Open government data is assumed to be a given by many in the US because our copyright law states that federal government data is not covered by copyright. (The situation in US states can vary, but the federal government's declaration sets the tone.) In other countries the situation is less clear and governments do not have a mandate to make data open. However, the open government data movement has purred on a number of fast-moving activities, many sponsored by governments themselves that encourage citizens to download and use government data.&lt;br /&gt;&lt;br /&gt;The UK government has a site, &lt;a href="http://data.gov.uk/"&gt;Opening up government&lt;/a&gt;, where it not only shares data but encourages people to develop apps that use the data. Apps here can alert you to new building and planning projects in your area, and give you real-time public transportation information.&lt;br /&gt;&lt;br /&gt;The EU has its own &lt;a href="http://www.govdata.eu/"&gt;Open Government Data Initiative&lt;/a&gt;. It provides the data under these terms of use:&lt;br /&gt;&lt;span&gt;&lt;blockquote&gt;All Data on &lt;a href="http://dev.govdata.eu/"&gt;dev.govdata.eu&lt;/a&gt; is  available under a worldwide, royalty-free, non-exclusive license to use,  modify, and distribute the datasets in all current and future media and  formats for any lawful purpose and that this license does not give you a  copyright or other proprietary interest in the datasets. &lt;/blockquote&gt;There is a European site for public sector information, the &lt;a href="http://www.epsiplatform.eu/"&gt;European Public Section Information Platform&lt;/a&gt;: Europe's One-Stop Shop on Public Sector Information Re-use. You can search by country and see news and developments relating to public data, much of which is available for re-use. Because many countries for not have an explicit statement in their copyright laws covering government data, one of the important early steps for these jurisdictions is to develop blanket licenses that they can apply to the data. So when you visit the site you see recent news that Norway has developed a &lt;a href="http://www.epsiplatform.eu/news/news/norwegian_data_license_feedback_request"&gt;license&lt;/a&gt; for its government data and is asking for feedback (if you read Norwegian).&lt;br /&gt;&lt;br /&gt;To understand the force of this movement, it is said that Albania and Bulgaria are on the verge of opening some government data.&lt;br /&gt;&lt;br /&gt;The Obama administration announced its &lt;a href="http://www.whitehouse.gov/open/"&gt;Open Government&lt;/a&gt; effort on the first day of his  administration.&lt;br /&gt;&lt;/span&gt;&lt;blockquote&gt;To the       extent practicable and subject to valid restrictions,  agencies should       publish information online in an open format that  can be retrieved,       downloaded, indexed, and searched by commonly  used web search       applications. An open format is       one that is  platform independent, machine readable, and made available to       the  public without restrictions that would impede the re-use of that        information.&lt;br /&gt;&lt;/blockquote&gt;Wired has a US-oriented "&lt;a href="http://http//howto.wired.com/wiki/Open_Up_Government_Data"&gt;how-to" wiki&lt;/a&gt; on OGD. (Of course, they include in their "how-to" examples &lt;a href="http://marijuanalobby.org/"&gt;MarijuanaLobby.org&lt;/a&gt;, being Wired, but it's a good example of the range of utility of OGD. )&lt;br /&gt;&lt;br /&gt;Not all data is at the country level, of course, and the movement is reaching into lower levels of government. &lt;a href="http://opendata.paris.fr/"&gt;Paris&lt;/a&gt; has an open data portal, while &lt;a href="http://www.slideshare.net/TonZijlstra/enschede-netherlands-open-data-motion"&gt;Enschede Netherlands&lt;/a&gt; has an open data declaration for its information. In Italy, the government of the &lt;a href="http://www.dati.piemonte.it/dati.html"&gt;Piemonte Region&lt;/a&gt; has a website for its open data.&lt;br /&gt;&lt;br /&gt;The government open data movement is heavily influenced by grassroots efforts to convince governments that open data is a good thing -- not just for government watchdogs and opposition movements, but for heathy government and strong business. In the UK there is a &lt;a href="http://wiki.okfn.org/wg/government"&gt;Working Group on Open Government Data &lt;/a&gt;of the &lt;a href="http://okfn.org/"&gt;Open Knowledge Foundation&lt;/a&gt;,  an independent not-for-profit that is promoting, as its name says, open  knowledge. In Italy there is the wonderfully named "&lt;a href="http://www.spaghettiopendata.org/"&gt;Spaghetti Open Data&lt;/a&gt;." Spain has a broad coalition of non-profits that form the "&lt;a href="http://www.proacceso.org/"&gt;Coalición Pro Acceso&lt;/a&gt;." The CKAN web site, which is a general archive of available datasets of all kinds, has OGD under a number of tags, such as "&lt;a href="http://ckan.net/tag/gov"&gt;gov&lt;/a&gt;". [Just out: &lt;a href="http://opengovernmentdata.org/film/"&gt;Open Government Data video&lt;/a&gt;.]&lt;br /&gt;&lt;br /&gt;We hear a lot about problems with copyright, with DRM, with information providers who want to lock down their products. Government data covers a huge variety of information types and is often the key information needed for a lot of civic and scientific decision-making. OGD can generate a mountain of new knowledge, and then tell you how high the mountain is.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-9121708334311944533?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/9121708334311944533/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=9121708334311944533' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/9121708334311944533'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/9121708334311944533'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/03/open-data-ii.html' title='Open Data II'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-3033726324185649684</id><published>2011-03-22T13:06:00.001-07:00</published><updated>2011-03-23T07:13:45.723-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><title type='text'>Judge Chin rejects AAP/Google settlement</title><content type='html'>I'll say more when I've read it, but I put a copy on the&lt;a href="http://www.archive.org/details/UsDistrictCourtNyDecisionAuthorsGuildV.Google"&gt; Internet Archive&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;After reading:&lt;br /&gt;&lt;br /&gt;The judge's decision holds no real surprises. His analysis is fully consistent with the reactions of the interested parties to the case. He rejects the settlement primarily on these grounds:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;It seems that a significant segment of the class of authors/publishers is not happy with the settlement. "Some 6800 class members opted out." (p.10) Also, a majority of the comments on the proposed settlement were negative, many coming from non-US copyright holders who did not identify with the class.&lt;/li&gt;&lt;li&gt;The settlement would make significant alterations to the current copyright regime, which should be a matter for Congress rather than the court.&lt;/li&gt;&lt;li&gt;The settlement's conclusion would go beyond the original lawsuit, which was over the digitization of in-copyright works by Google and the presentation of snippets relating to searches. The settlement would allow sales of full text works, which was never an issue at the time of the original lawsuit.&lt;/li&gt;&lt;/ul&gt;Although he rejects the settlement on numerous grounds, the judge concludes by saying "...many of the concerns raised in the objections would be ameliorated if the ASA were converted from an "out-out" settlement to an "opt-in" settlement." (p. 46) This leaves the door open for yet another settlement attempt between the parties.&lt;br /&gt;&lt;br /&gt;It is important to note that the position of digitization and ebooks today are vastly different than they were in 2005 when the authors and publishers first sued Google over its library digitization project. It is possible that if the question of Google's digitizing were to be put forth for the first time today, the actions of the  parties and the results would be vastly different. This is clearly a case where technology has moved forward at a rapid pace while the courts were contemplating an agreement that was standing still.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;What now?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;It's hard to believe that Google and the AAP/AG have not prepared themselves for this possibility. Yet, certain activities have gone forward as if the settlement were already approved.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;A form of the Book Rights Registry is in place in the sense that there is a database of digitized works and a way to claim them to receive the&lt;a href="http://www.googlebooksettlement.com/help/bin/answer.py?answer=118704&amp;amp;hl=en#q16"&gt; proposed one-time payment&lt;/a&gt;. Presumably that payment is now not going to happen, but meanwhile Google has a large database with copyright holder information (including contact info, if I remember the form correctly).&lt;/li&gt;&lt;li&gt;The BRR has a chosen Director (Michael Healey).&lt;br /&gt;&lt;/li&gt;&lt;li&gt;It isn't clear if Google has continued digitizing books that are under copyright without specific permission. To be sure they have made many deals with publishers and with libraries to digitize works since the 2008 date when the settlement was first proposed, and digitization has gone forward.&lt;/li&gt;&lt;li&gt;Some libraries that had partnered with Google prior to the lawsuit have negotiated new contracts that are compatible with some of the conditions contained in the settlement. I don't know if these contracts have been signed or have been awaiting the result of the lawsuit but I do recall that the libraries obtained less rights in relation to retaining copies of their digitized books in the new contracts than they did in the old. The upshot being it isn't clear where this leaves the partner libraries, nor organizations like HathiTrust who are involved in the storage and possible uses of the Google digitized books. &lt;/li&gt;&lt;li&gt;For libraries and institutions that were looking forward to subscription access to the books, this access is now a big question mark. It was dependent on conditions in the settlement. &lt;/li&gt;&lt;/ul&gt;There are undoubtedly many other issues that are now open questions. When the settlement was first announced I began a "&lt;a href="http://kcoyle.blogspot.com/2009/01/start-at-questions-list.html"&gt;question list&lt;/a&gt;." It might be a good idea to revive that given this new perspective. And for those wondering "what now?" (that is, all of us) there's a &lt;a href="http://www.wo.ala.org/districtdispatch/wp-content/uploads/2011/03/gbs-march-madness-diagram-UPDATE.pdf"&gt;flow chart&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-3033726324185649684?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/3033726324185649684/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=3033726324185649684' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3033726324185649684'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3033726324185649684'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/03/judge-chin-rejects-aapgoogle-settlement.html' title='Judge Chin rejects AAP/Google settlement'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-159702464764709007</id><published>2011-03-04T08:12:00.000-08:00</published><updated>2011-03-04T08:30:08.545-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='linked data'/><category scheme='http://www.blogger.com/atom/ns#' term='open access'/><title type='text'>Open Data I</title><content type='html'>&lt;div&gt;The idea of open data has gone from an extremist rallying cry to a mainstream movement. In the next few posts I'll highlight just an iceberg tip's worth here, but expect to see more about this every day that passes.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The UK's educational research arm, JISC (something like NSF but more for education rather than pure science) and the research libraries' organization, RLUK, undertook a study about the advantages and possibilities afforded by opening data from libraries, archives and museums. They have produced the &lt;a href="http://obd.jisc.ac.uk/"&gt;Open Bibliographic Data Guid&lt;/a&gt;e, which investigates the business case for providing bibliographic data that can be re-used. &lt;/div&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/-28l1y4G8Des/TXERBw_q6wI/AAAAAAAAA3g/80BkXLECzfw/s1600/jiscOpenData.jpg"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 305px;" src="http://1.bp.blogspot.com/-28l1y4G8Des/TXERBw_q6wI/AAAAAAAAA3g/80BkXLECzfw/s400/jiscOpenData.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5580260135318252290" /&gt;&lt;/a&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is a practical, not a utopian, vision of open data. &lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;div&gt;"In earlier times, observers may have considered the ‘open data movement’ as the preserve of a certain type of fanaticism also associated with Open Source Software (OSS) and Open Content, emotionally and ideologically linked to the spirit of 1969.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;However, OSS and Open Content have now morphed in to propositions with clear business cases of interest to corporations, institutions and governments. National strategies and Chief Information Officers espouse Open Source Software for financial and business benefit, whilst academic leaders are supporting Open Access Journals and Open Educational Resources (OER)."(&lt;a href="http://obd.jisc.ac.uk/history"&gt;link&lt;/a&gt;)&lt;/div&gt;&lt;/blockquote&gt;&lt;div&gt;The report gives 17 different use cases -- situations in which an institution might want to provide its data with some degree of openness.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;1 – Publish data for unspecified use&lt;/div&gt;&lt;div&gt;2 – Publish open linked data for unspecified use&lt;/div&gt;&lt;div&gt;3 – Supply data for Physical Union&lt;/div&gt;&lt;div&gt;4 – Allow Physical Union Catalogue to publish data&lt;/div&gt;&lt;div&gt;5 – Expose data for federation into Virtual Union Catalogue&lt;/div&gt;&lt;div&gt;6 – Publish grey literature data&lt;/div&gt;&lt;div&gt;7 – Contribute data to Google Scholar&lt;/div&gt;&lt;div&gt;8 – Publish activity data &lt;/div&gt;&lt;div&gt;9 – Supply holdings data for Collection &lt;/div&gt;&lt;div&gt;10 – Expose holdings / availability data for Closest Copy location &lt;/div&gt;&lt;div&gt;11 – Share data for Collaborative Cataloguing &lt;/div&gt;&lt;div&gt;12 – Supply data for Crowd Sourced Cataloguing &lt;/div&gt;&lt;div&gt;13 – Supply data to be enhanced for own &lt;/div&gt;&lt;div&gt;14 – Publish data for LIS research &lt;/div&gt;&lt;div&gt;15 – Allow personal use of data for Reference Management &lt;/div&gt;&lt;div&gt;16 – Publish data for lightweight application development &lt;/div&gt;&lt;div&gt;17 – Allow commercial use of data in mobile application&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For each of the cases the report discusses pros and cons for the institution, its users, and the world, as well as the business case for making ones' data open. They acknowledge the complexity of our current environment of bibliographic data ownership:&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;div&gt;"Our problems with bibliographic metadata are quite specific:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Non-profit and commercial players have built businesses around datasets of MARC records, indexing / TOC services and journal Knowledge Bases – but what is original about those accumulations?&lt;/li&gt;&lt;li&gt;Bibliographic records in the circulation amongst libraries are of uncertain and complex provenance, with the exceptions of those explicitly tagged by a ‘vendor’ or exclusive to a special collection" (&lt;a href="http://obd.jisc.ac.uk/history"&gt;link&lt;/a&gt;)&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/blockquote&gt;&lt;div&gt;JISC doesn't stop at this report but is sponsoring projects and ongoing activities in this area. Already the British Library has produced its &lt;a href="http://ckan.net/package/jiscopenbib-bl_bnb-1"&gt;British National Bibliography&lt;/a&gt; data openly for reuse. You can keep up with these activities through the &lt;a href="http://rdtf.mimas.ac.uk/newsletter/rdtfnewsletter01-march2011.pdf"&gt;newsletter&lt;/a&gt; (&lt;a href="http://rdtf.mimas.ac.uk/newsletter/"&gt;subscribe here&lt;/a&gt;)  whose logo reads: &lt;b&gt;&lt;i&gt;One to many; Many to one: Towards a virtuous flow of library, archival and museum data.&lt;/i&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-159702464764709007?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/159702464764709007/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=159702464764709007' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/159702464764709007'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/159702464764709007'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/03/open-data-i.html' title='Open Data I'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-28l1y4G8Des/TXERBw_q6wI/AAAAAAAAA3g/80BkXLECzfw/s72-c/jiscOpenData.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-3776427222697095790</id><published>2011-02-06T07:02:00.000-08:00</published><updated>2011-02-06T09:46:13.321-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='skyriver'/><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>Skyriver Replies</title><content type='html'>Following up on the these early stages of what will probably be an interminable legal case (it's easy to understand why one should avoid going to court whenever possible), The SkyRiver has &lt;a href="http://www.librarytechnology.org/ltg-displaytext.pl?RC=15455"&gt;replied&lt;/a&gt; to OCLC's Motion to Dismiss.[&lt;a href="http://kcoyle.blogspot.com/2010/12/oclc-motion-to-dismiss-pt-i.html"&gt;1&lt;/a&gt;] [&lt;a href="http://kcoyle.blogspot.com/2010/12/oclc-motion-to-dismiss-pt-ii.html"&gt;2&lt;/a&gt;] This is the first document I have seen that to me clearly lays out Skyriver's basic contentions. Note that the major part of this document is the usual lawyerly recitation of cases supporting one statement or the other, and I have no idea what the legal arguments mean or whether they are convincing or not. But here are SkyRiver's primary facts as this document lays them out:&lt;br /&gt;&lt;br /&gt;1. OCLC has monopolies in the US academic library market&lt;br /&gt;&lt;blockquote&gt;"OCLC is monopolizing three product or service markets—bibliographic data of libraries’ holdings; cataloging service; and interlibrary lending service (ILL). OCLC is attempting to monopolize a fourth service market—integrated library systems (ILS)." p. 1&lt;br /&gt;&lt;/blockquote&gt;2. OCLC has used those monopoly positions to prevent competition&lt;br /&gt;&lt;blockquote&gt;"Since at least 1987, OCLC has demanded that its member libraries agree to terms of membership that prohibit sharing the metadata of their own library holdings contributed to OCLC’s bibliographic database known as WorldCat with any for-profit firms for commercial use and require member libraries to use OCLC’s services. OCLC has imposed these membership terms to prevent the development of competing bibliographic databases, cataloging services or ILL services by erecting barriers to entry in these three markets. OCLC is also using its monopoly power in these three markets in its attempt to monopolize the ILS market." p.1&lt;/blockquote&gt;3. OCLC has targeted SkyRiver's business by using punitive pricing for libraries that use SkyRiver's cataloging services&lt;br /&gt;&lt;blockquote&gt;"OCLC’s conduct has injured SkyRiver by deterring libraries from using its service, and has injured libraries that are using SkyRiver to reduce costs by preventing those libraries from uploading their new records into WorldCat at the price charged to everyone except SkyRiver users."  p. 2&lt;/blockquote&gt;Beyond that the arguments become more complex. In particular there is the issue of the 20+ years that OCLC has been building up WorldCat under a policy that has prohibited (acc. to the response, p.4) libraries from sharing their cataloging data with for-profit entities. With no other non-profit entity providing cataloging services to US academic libraries, the records are essentially locked-up in WorldCat and no one else can enter the market.&lt;br /&gt;&lt;br /&gt;This brings me to a point that I got wrong in a previous post, which is that Skyriver &lt;span style="font-style: italic; font-weight: bold;"&gt;is&lt;/span&gt; asking for access to the WorldCat database. The argument there, if I read it correctly, is that WorldCat is the only major source of academic library holdings that can be used for an effective ILL service. WorldCat is the result of monopoly practices. To allow for competition, WorldCat (e.g. bibliographic data and holdings) should be made available for a reasonable price to competing ILL providers. While this seems jarring at first, the more I think about it the more sense it makes.&lt;br /&gt;&lt;br /&gt;What the response does not say explicitly, and perhaps it would be irrelevant in a legal case, is that one could look on WorldCat as a shared community resource, not the property of OCLC. In fact, OCLC uses this kind of argument in its record use policy, but somehow leads to the conclusion that WorldCat should not be used to foster non-OCLC library services. It seems easy to make the opposite argument, which would be that WorldCat could be the basis for a wide range of services that would benefit libraries, even if they do not come from OCLC. Imagine if OCLC were to set non-discriminatory pricing for use of WorldCat and anyone could make use of the WorldCat data. There could be a "share-alike" clause that would require those users to return pertinent information to the bibliographic collective. WorldCat would grow, and the range of products and services available to libraries would grow. This seems like a GOOD THING.&lt;br /&gt;&lt;br /&gt;I realize it may not be easy to do the analysis that would lead to pricing that both fosters sharing and makes it possible even for small businesses* to arise in the library market. It should be possible, given today's technology, to do this efficiently but we know very little about the cost structure of WorldCat. It is clear that there are many activities relating to the care and management of that database, all intertwined with OCLC services and valuable research projects, as well as linked deeply into tens of thousands of library systems around the world. Should the court require OCLC to open WorldCat for use, we need to see a transition that is non-destructive to the library ecology.&lt;br /&gt;&lt;br /&gt;* The reason I emphasize small businesses is that I believe that smaller, more nimble vendors could exist to serve the needs of specialized and smaller libraries which are not OCLC members at this time. I see the potential to widen the community of sharing, even to include more non-library institutions and businesses. Another GOOD THING, IMO.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-3776427222697095790?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/3776427222697095790/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=3776427222697095790' title='17 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3776427222697095790'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3776427222697095790'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/02/skyriver-replies.html' title='Skyriver Replies'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>17</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-8788027364626811896</id><published>2011-02-01T16:02:00.000-08:00</published><updated>2011-02-01T17:13:27.553-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='linked data'/><category scheme='http://www.blogger.com/atom/ns#' term='RDA'/><title type='text'>knowledge Organization in Norway</title><content type='html'>&lt;p&gt;Last week I attended &lt;a href="http://www.hio.no/Enheter/Avdeling-for-journalistikk-bibliotek-og-informasjonsfag/Konferanser/Kunnskapsorganisasjonsdagene-2011"&gt;Kunnskapsorganisasjonsdagene 2011&lt;/a&gt; in Oslo. (Knowledge Organization 2011 conference.) The topics ranged around linked data, the FRs, and RDA. I will try to give some flavor of the event, as I experienced it. That last caveat is because only three of the presentations were in English, the rest in Norwegian, and how much I understood really depended on whether there were slides with a lot of diagrams. I was somewhat in the position of the dog in this cartoon:&lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_Z1EA7hov2P0/TUiggFBKJZI/AAAAAAAAA3U/ExOIEL_4Vjg/s1600/100_1428.JPG"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://1.bp.blogspot.com/_Z1EA7hov2P0/TUiggFBKJZI/AAAAAAAAA3U/ExOIEL_4Vjg/s400/100_1428.JPG" border="0" alt="" id="BLOGGER_PHOTO_ID_5568877412206912914" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;p&gt;with "Ginger" being replaced by "RDA", "MARC", and "Karen Coyle." &lt;/p&gt;&lt;p&gt;I was the first speaker of day 1, and &lt;a href="http://kcoyle.net/presentations/oslo2011.pdf"&gt;presented&lt;/a&gt; on the topic of RDA and linked data.  The next talk was from the &lt;a href="http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Pode"&gt;Pode project,&lt;/a&gt; a research project bringing together FRBR and RDF concepts and linking data to dbpedia, VIAF, and &lt;a href="http://dewey.info/"&gt;Dewey in RDF&lt;/a&gt;. I got the impression that while experimental, the results are sophisticated, particularly because of the mix of data sources the project is working with. The afternoon had an introduction to (and, from the moments of laughter, some commentary on) RDA by Unni Knutsen. There appears to be an equal amount of interest and skepticism about RDA. I am not sure that AACR had this same effect outside of the Anglo-American library community, and would be very interested to hear more about the impact of A-A cataloging rules, especially whether this impact is greatly increased due to the degree of international sharing of bibliographic data.&lt;/p&gt;&lt;p&gt;Maja Žumer, of the University of Ljubljana, Slovenia, a member of the FRSAD working group gave the best explanation of the meaning behind FRSAD's "thema" and "nomen" that I have yet heard. It is beginning to make sense.  Maja is the co-author of a study on FRBR and library user mental models that was published in the Journal of Documentation in two parts. (Preprints [&lt;a href="http://www.ff.uni-lj.si/oddelki/biblio/oddelek/osebje/dokumenti/pisanskizumer1a.pdf"&gt;1&lt;/a&gt;] [&lt;a href="http://www.ff.uni-lj.si/oddelki/biblio/oddelek/osebje/dokumenti/pisanskizumer2a.pdf"&gt;2&lt;/a&gt;]) I will link to her slides when they are made available. A key take-away is that FRBR, FRAD and FRSAD have taken very different approaches that will now need to be reconciled. FRBR presents a closed universe of bibliographic data, with only FRBR entities allowed to be subjects of bibliographic resources. FRSAD essentially opens that up to anything in the known universe. Among other things this creates a possibility to link non-bibliographic concepts to described bibliographic entities. Or, at least, that's how I read it.&lt;/p&gt;&lt;p&gt;I was asked to do a short wrap-up of the first day, and as I usually do I turned to the audience for their ideas. Since we realized we are short on answers and long on questions, we decided to gather some of the burning questions. Here are the ones I wrote down:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;If not RDA, what else is there?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Are things on hold waiting for RDA? Are people and vendors waiting to see what will happen?&lt;/li&gt;&lt;li&gt;Why wasn't RDA simplified?&lt;/li&gt;&lt;li&gt;How long will we pay for it?&lt;/li&gt;&lt;li&gt;Will communities other than those in the JSC use it?&lt;/li&gt;&lt;li&gt;Can others join JSC to make this a truly international code?&lt;/li&gt;&lt;li&gt;Should we just forget about this library-specific stuff and use Dublin Core?&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;I suspect that there are many others wondering these same things.&lt;/p&gt;&lt;p&gt;The next day there were more interesting talks. One was entitled: Må MARC dø? by Magnus Enger of Libriotech. The title means: Must MARC die? The first slide was one that needs little translation. It said simply:&lt;/p&gt;&lt;blockquote&gt;JA!&lt;/blockquote&gt;&lt;p&gt;Tom Scott of BBC gave a visually stunning talk about the data he manages around the nature and wildlife programming. He explained the reasons for pulling data from a variety of sources, including Wikipedia. (See &lt;a href="http://www.bbc.co.uk/nature/life/Chromista"&gt;this page&lt;/a&gt; -- and note that it encourages readers to improve the Wikipedia entry if they feel it is incorrect or insufficient.)&lt;/p&gt;&lt;p&gt;In another excellent talk, which I hope will come out in an English translation, Kim Tallerås and David Massey did a step-by-step walkthrough of moving from MARC-encoded data into fully linked data format, complete with URIs. There was another talk focusing on the Norwegian webDewey from the national library, with examples of converting that data to RDF.&lt;/p&gt;&lt;p&gt;About that time I ran out of steam, but I will post a link here when the presentations are up online. In spite of the language barrier, much content is accessible from these talks.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;As is often the case I was very impressed at the quality of experimentation that is taking place by people who really want to see library data transformed and made web-able. I think we are at the start of a new and highly fruitful phase for libraries.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-8788027364626811896?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/8788027364626811896/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=8788027364626811896' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8788027364626811896'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8788027364626811896'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/02/knowledge-organization-in-norway.html' title='knowledge Organization in Norway'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_Z1EA7hov2P0/TUiggFBKJZI/AAAAAAAAA3U/ExOIEL_4Vjg/s72-c/100_1428.JPG' height='72' width='72'/><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-3329552709402753795</id><published>2011-01-21T08:48:00.000-08:00</published><updated>2011-01-22T07:04:56.169-08:00</updated><title type='text'>Analysis of MARC fixed fields</title><content type='html'>&lt;p&gt;I've gone on and on on various lists about trying to analyze MARC as data elements, to the point that I'm sure many people just wish I'd shut up. The best way to shut me up is for me to either finish or at least take the analysis as far as I can. To that end, I now have a wiki page that gives my analysis of the &lt;a href="http://futurelib.pbworks.com/w/page/35201344/fixed_fields"&gt;MARC fixed fields&lt;/a&gt; (007-008). (&lt;a href="http://futurelib.pbworks.com/w/page/29114548/MARC-elements"&gt;Home page for project.&lt;/a&gt;)&lt;br /&gt;&lt;/p&gt;&lt;p&gt;These fields are pretty straight-forward since by their definition they are already defined as discrete data elements. The only tricky bit is that each data element is tied to a particular resource format (e.g. text, map, video). &lt;/p&gt;&lt;p&gt;I have captured the link between the elements as elements and MARC by creating a key that includes the tag, the format, and the position within the tag:&lt;/p&gt;&lt;p&gt;007microform05&lt;/p&gt;&lt;p&gt;For the actual values (and my analysis includes the values, although I skipped things like "no attempt to code" -- those could be added in), I follow the same format:&lt;/p&gt;&lt;p&gt;007microform05a&lt;/p&gt;&lt;p&gt;There is an example of a single element in Turtle and RDF/XML on the wiki page. There is also a link to the whole in Turtle and RDF/XML, which I might as well link to here:&lt;/p&gt;&lt;p&gt;&lt;a href="http://kcoyle.net/rda/007_008.ttl"&gt;turtle&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="http://kcoyle.net/rda/007_008.rdf"&gt;RDF/XML&lt;/a&gt;&lt;/p&gt;&lt;p&gt;Note that these files, in fact the whole project, is basically an attempt at a proof of concept. Treat it as totally BETA and make any comments, suggestions, etc. that you want. If you see this as being useful, feel free to volunteer some time to make it better.&lt;/p&gt;&lt;p&gt;&lt;em&gt;Special thanks to Gordon Dunsire for turning out code from my text.&lt;br /&gt;&lt;/em&gt;&lt;/p&gt;&lt;p&gt;Note:&lt;/p&gt;&lt;p&gt;I've added a link to an &lt;a href="http://kcoyle.net/rda/010-048all.html"&gt;HTML display&lt;/a&gt; of the first analysis of the &lt;a href="http://futurelib.pbworks.com/w/page/35201483/0XX_fields"&gt;0XX fields&lt;/a&gt; to that page. The trick here is to figure out which subfields can stand alone as data elements, and which have dependencies (like "source of code") that require them to be compound elements.&lt;br /&gt;&lt;em&gt;&lt;/em&gt;&lt;/p&gt;&lt;p&gt;&lt;em&gt;&lt;br /&gt;&lt;/em&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-3329552709402753795?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/3329552709402753795/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=3329552709402753795' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3329552709402753795'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3329552709402753795'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2011/01/analysis-of-marc-fixed-fields.html' title='Analysis of MARC fixed fields'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-6751541698945273107</id><published>2010-12-25T09:53:00.000-08:00</published><updated>2010-12-25T10:16:40.250-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='internet'/><title type='text'>Signs of success</title><content type='html'>Either this:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}href=" com="" _z1ea7hov2p0="" tryxn__nqyi="" aaaaaaaaa2s="" fszq1wgzzfm="" s1600="" png=""&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 400px; height: 234px;" src="http://1.bp.blogspot.com/_Z1EA7hov2P0/TRYxN__NQyI/AAAAAAAAA2s/Fszq1WGzzfM/s400/Screenshot.png" alt="" id="BLOGGER_PHOTO_ID_5554681306992689954" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;(unavailable)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Or this:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_Z1EA7hov2P0/TRYxOK4xECI/AAAAAAAAA20/JgI8sE8sTIo/s1600/Screenshot-1.png"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 400px; height: 234px;" src="http://2.bp.blogspot.com/_Z1EA7hov2P0/TRYxOK4xECI/AAAAAAAAA20/JgI8sE8sTIo/s400/Screenshot-1.png" alt="" id="BLOGGER_PHOTO_ID_5554681309918466082" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;(reduced to using a raw ip address)&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Z1EA7hov2P0/TRY0hCZPuHI/AAAAAAAAA28/5PsxwJA-miE/s1600/ip.jpg"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 400px; height: 300px;" src="http://4.bp.blogspot.com/_Z1EA7hov2P0/TRY0hCZPuHI/AAAAAAAAA28/5PsxwJA-miE/s400/ip.jpg" alt="" id="BLOGGER_PHOTO_ID_5554684932591171698" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-6751541698945273107?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/6751541698945273107/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=6751541698945273107' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/6751541698945273107'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/6751541698945273107'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/12/signs-of-success.html' title='Signs of success'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_Z1EA7hov2P0/TRYxN__NQyI/AAAAAAAAA2s/Fszq1WGzzfM/s72-c/Screenshot.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-1028656008593548811</id><published>2010-12-14T10:48:00.000-08:00</published><updated>2010-12-17T08:02:16.557-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='skyriver'/><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>OCLC Motion to Dismiss, Pt II</title><content type='html'>Continuing on...&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Rights&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Here's a somewhat extended quote from the Motion that quotes the original complaint:&lt;br /&gt;&lt;blockquote&gt;"At other points in the Complaint, without addressing the text of the records use policy, Plaintiffs characterize the policy as placing broad restriction on a library's use of its own records. ([Complaint] paras. 34-36) However, these conclusory allegations are belied by the actual terms of the records use policy pled above.. For example, Plaintiffs claim that 'a member library may not transfer or share records of its own holdings with commercial firms' ([complaint] para 35), but the records use policy states no such thing. Throughout these allegations, moreover, Plaintiffs confuse and obscure the terms 'OCLC records' and 'library records.' In reality, the situation is simple: OCLC does not prohibit a library from sharing its original cataloging records with whomever it pleases; it does, consistent with the fact that the WorldCat database is copyright, claim a legal right to the unique identifier information used to link and make usable records in WorldCat." (Motion, pp 7-8)&lt;/blockquote&gt;&lt;blockquote&gt;"Again, at most, the Complaint pleads only that libraries cannot share OCLC's records, not that they cannot share the records they themselves created." (Motion, p. 14)&lt;/blockquote&gt;This is a very interesting set of statements. First, it plays with the ambiguity in talking about "library records," denying that libraries cannot convey records of their holdings, as stated in the Complaint, then stating that they can share their original cataloging records, which is not what most in the library world would consider equivalent to "library holdings." What it comes down to is the ownership of the records in the library catalogs that represent the holdings of the library. By "the holdings of the library" I understand not just some holdings, but either all of the holdings or some useful set of those holdings. The set of records that were originally cataloged by the library is a somewhat random set, and not useful as "library holdings." OCLC claims ownership in all records in a library's catalog that were not created as original cataloging by that library. Although this is a distinction it is not a distinction that relates to any particular functionality or useful library projects relating to their holdings. It's useless nonsense, is what it is, nitpicky, and proof that OCLC was boxed into a corner as it tried to claim ownership over the millions of records created by libraries around the world.&lt;br /&gt;&lt;br /&gt;OCLC also states in the second quote above that those records in the library data are "OCLC's records" and are not records that the libraries created. Here, "created" is a key verb. Any library that has done significant modification and upgrading to a record can probably claim at least an amount of co-creation with other libraries. The claim that those records belong to OCLC is an insult to the libraries that have put so much effort into the shared pool of bibliographic data. Of course, OCLC would counter that the libraries and OCLC are one and the same. The unilateral actions of OCLC around the record use policy definitively shattered that view.&lt;br /&gt;&lt;br /&gt;Equally interesting is the claim of copyright on the database, a claim that has not been challenged and that might not survive a challenge. A database of bibliographic data may just be seen as a compilation of facts, essentially sweat of the brow rather than a creative output. Add to that the fact that much of the sweat was not OCLC's but was on the part of thousands of libraries, and the copyright claim looks thin. Ditto the claim to the OCLC number, which is purely a sequential number assigned to records as they enter the system. The claim that the OCLC identifier makes OCLC records usable is not defensible, IMO, in that every database assigns numbers to things as part of the mechanical database management process. There's nothing new or creative about the fact that OCLC records have OCLC database numbers.&lt;br /&gt;&lt;br /&gt;Remember, though, that these statements are not meant for you and me; they are addressed to a court that may have very little knowledge in these matters. Obfuscation of the facts is undoubtedly part of the trial process, and on the part of all parties involved. Unfortunately, OCLC's motion goes beyond obfuscation -- it gets nasty.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Sarcasm and Nastiness&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I've only read the legal documents for a few cases that I'm particularly interested in, so my experience here is limited. However, I would assume that a court case would best be won on cleverness, wily strategies and the ability to out-wit ones' opponent. In this as in other professional and public endeavors, I would expect the participants to affect a tone of detached politeness, even while skewering their rival. The OCLC motion plummets into sarcasm and nastiness. Here are some quotes:&lt;br /&gt;&lt;blockquote&gt;"...Plaintiffs have thrown a plethora of allegations of OCLC's purportedly anticompetitive actions into the Complain to see if any stick..." (Motion, pp. 1-2)&lt;/blockquote&gt;&lt;blockquote&gt;"While OCLC denies that either of these libraries has suffered as the result of anything other than purchasing the Plaintiff's inferior cataloging software..." (Motion, p. 17)&lt;br /&gt;&lt;br /&gt;"... vigorous competition against a company offering less expensive, but  inferior products, is perfectly lawful." (Motion, p. 1)&lt;/blockquote&gt;&lt;blockquote&gt;"Nevertheless, what is sauce for the goose is sauce for the gander -- having pled a fiction that undercuts the existence of any claims they can pursue, Plaintiffs cannot claim to have been injured..." (Motion, p. 4, footnote)&lt;/blockquote&gt;&lt;blockquote&gt;"Nothing in the antitrust laws requires OCLC to subsidize SkyRiver's inferior product by setting its pricing for registering holdings into WorldCat as low as possible." (Motion, p. 28)&lt;/blockquote&gt;I find these statements to be embarrassingly unprofessional in nature, although for all I know this is the norm in legal arguments.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Separate Realities&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I suppose that one of the main skills for legal argumentation is the ability to present "facts" in ways that benefit your client, regardless of the facts. (If I were a judge and had to listen to this stuff, I'm sure I'd be driven to homicide.) Here are some examples from the motion to dismiss:&lt;br /&gt;&lt;br /&gt;1. The named libraries, Michigan State and Cal State Long Beach, were not harmed by OCLC, they simply declined to purchase OCLC's record upload service. This is cited as proof that they were not coerced into making a purchase (which appears to be one of the antitrust offenses). (p. 29) There is no mention that the libraries could not afford the price that OCLC offered, that the price changed without warning, etc.&lt;br /&gt;&lt;br /&gt;2. WorldCat Local is not a competitor to ILS systems because it exists &lt;span style="font-style: italic;"&gt;in addition to&lt;/span&gt; the ILS system. The Motion of course completely fails to connect WC Local, its attempt to limit use of the bibliographic data, and the upcoming "in the cloud" library systems platform. Are they worried that it might actually look like improper use of the WorldCat database?&lt;br /&gt;&lt;br /&gt;3. SkyRiver does have bibliographic records, so OCLC cannot be accused of having a monopoly on bibliographic records. (As if any bunch of bibliographic records will do.) Elsewhere in the document they boast of having the largest bibliographic database. Are we back to the Goose and the Gander?&lt;br /&gt;&lt;br /&gt;_____&lt;br /&gt;&lt;br /&gt;These are just a few of the topics in the Motion, and just the ones that I found most interesting. They may not even be the most relevant topics relating to the lawsuit. I suggest that you read the &lt;a href="http://www.librarytechnology.org/web/breeding/skyriver-vs-oclc/"&gt;Motion and other documents&lt;/a&gt; for yourself.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-1028656008593548811?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/1028656008593548811/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=1028656008593548811' title='19 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1028656008593548811'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1028656008593548811'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/12/oclc-motion-to-dismiss-pt-ii.html' title='OCLC Motion to Dismiss, Pt II'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>19</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-5137227312734273617</id><published>2010-12-14T08:30:00.000-08:00</published><updated>2010-12-16T06:41:58.345-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='skyriver'/><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>OCLC Motion to Dismiss, Pt I</title><content type='html'>OCLC has filed a motion to dismiss in the anti-trust lawsuit brought by SkyRiver/III. I presume that this is Standard Operating Procedure in cases of this type. As someone who is not versed in the complexities of antitrust law, I have no idea if OCLC makes a good case in its motion. My impression is that the OCLC lawyers are quite adept, and that bodes well for OCLC in the case.&lt;br /&gt;&lt;br /&gt;I will comment on some interesting text and subtext of the motion. Since this will get long, here is quick summary of what follows:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The motion states that SkyRiver has so far offered little proof of harm due to OCLC's business practices.&lt;/li&gt;&lt;li&gt;The motion may play on the court's ignorance of the library world and of OCLC's definitions.&lt;/li&gt;&lt;li&gt;OCLC makes some interesting claims to rights.&lt;/li&gt;&lt;li&gt;The motion makes claims that twist the words of SkyRiver's complaint. &lt;/li&gt;&lt;li&gt;The motion contains some unfortunate use of sarcasm and nastiness.&lt;/li&gt;&lt;li&gt;The motion undermines some previous OCLC claims as to the force of the Record Use policy.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Little Proof&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The motion claims that the SkyRiver complaint contains few hard facts that could be used to back up the anti-trust claims. (Although I have no idea how detailed such a complaint is supposed to be.) It doesn't explain the library market and OCLC's role in it. What I find particularly lacking is that there is no comparison of pricing for record uploads between the libraries that moved to SkyRiver for cataloging and other libraries that upload records to OCLC. (According to the 2009 annual report, only 12% of records added to WorldCat were added via cataloging on OCLC; the rest were batch loaded.)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Ignorance and Definitions&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;OCLC plays heavily on the confusion between WorldCat, the database, and the records in libraries' catalogs. This is not an easy concept to grasp, and it is not explained well in the SkyRiver complaint. Wherever SkyRiver's complaint refers to "library records" OCLC counters using "WorldCat" in its place. It makes a huge difference to be talking about the records in a library's catalog vs. the entire WorldCat database. OCLC claims that SkyRiver is demanding that OCLC make all of WorldCat available for free to competitors. What is actually said is:&lt;br /&gt;&lt;blockquote&gt;"Library records should be freely and openly available for use and re-use either in the public domain or by reasonable means of access for all, including for-profit library services firms." (Complaint, para. 76)&lt;/blockquote&gt;&lt;br /&gt;But OCLC re-words this in its response as:&lt;br /&gt;&lt;blockquote&gt;"... (a) library records should be free, regardless of OCLC's inestment in aggregating, normalizing, enhancing, maintaing(sic), and delivering services based on them..." (Motion, p. 10)&lt;/blockquote&gt;OCLC also says:&lt;br /&gt;&lt;blockquote&gt;"Plaintiffs pled, at most, only that libraries cannot share OCLC records, not that they are prevented from sharing records they created." (Motion, p. 21)&lt;/blockquote&gt;What is clear here, as it is throughout the motion document, is that SkyRiver is talking about the records that are in library catalogs, and OCLC is talking about "OCLC" or "WorldCat" records. By referring to the records in library catalogs as "OCLC" records, OCLC thus claims ownership to those records. In the former meaning, the libraries are prevented from making use of the records in their catalogs as they wish; in the latter, OCLC is the owner of a database and claims are being made against that database. Unless these definitions are cleared up, the two parties are just talking past each other, and no member of the court is going to make sense of it all. That, of course, would probably be to OCLC's advantage.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Record Use Policy&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The original complaint cites the OCLC record use policy as a means by which OCLC maintains &lt;blockquote&gt;"strict control over its members' access and use of the WorldCat database...". (Complaint, para. 33)&lt;/blockquote&gt; OCLC's motion first complains that SkyRiver did not attach a copy of the Policy with its original filing (but did so to their response to the Motion to Transfer). This is irrelevant to the case, I believe, and therefore is a bit of sniping at SkyRiver's lawyers, hinting that they aren't doing a good job. Anyway, here's how OCLC replies to that:&lt;br /&gt;&lt;blockquote&gt;"The nature of these documents is not pled: it is not claimed that these documents are anything other than 'guidelines' OCLC publishes or that OCLC has ever used these documents to prevent a library from providing its catalog records to Plaintiffs or any other entity." (Motion, p. 7)&lt;/blockquote&gt;There's more, but let's first examine this statement. During the big broo-ha-ha about the policy, Karen Calhoun published "&lt;a href="http://community.oclc.org/metalogue/archives/2008/11/notes-on-oclcs-updated-record.html"&gt;Notes on OCLC's updated Record Use Policy&lt;/a&gt;" on the OCLC blog, and stated:&lt;br /&gt;&lt;blockquote&gt;"The updated policy is a legal document.  Being a player on the Web, working on behalf of libraries, requires that the policy be a legal document."&lt;/blockquote&gt; That is of course the opposite of what is said in the motion.&lt;br /&gt;&lt;span style="font-style: italic;"&gt;(See comment below by Jennifer Younger: "The new 2010 policy is correctly characterized in OCLC's Motion to  Dismiss as a code of good practice to guide members' choices about how  they share their copies of WorldCat records.")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;What is sad, however, is the statement, true as far as I know, that OCLC has never used these documents to prevent libraries from sharing their records. It hasn't had to, because the mere threat has been enough to prevent libraries from acting. The libraries that have released their records have done so unscathed, but they are few. There are of course two ways to interpret this: libraries are afraid to release their records, fearing retribution, or that libraries agree with OCLC's argument that WorldCat would be endangered should library records be openly shared.&lt;br /&gt;&lt;br /&gt;I'll pause here and take up again shortly.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-5137227312734273617?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/5137227312734273617/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=5137227312734273617' title='15 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5137227312734273617'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5137227312734273617'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/12/oclc-motion-to-dismiss-pt-i.html' title='OCLC Motion to Dismiss, Pt I'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>15</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-3725208079101179016</id><published>2010-12-06T04:10:00.000-08:00</published><updated>2010-12-06T05:12:41.497-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='linked data'/><title type='text'>Online 2010 and SWIB</title><content type='html'>I'm just back from a lengthy trip that ended at the &lt;a href="http://swib.org/swib10/"&gt;Semantic Web in Bibliotheken&lt;/a&gt; (SWIB)(&lt;a href="http://twitter.com/search?q=%23swib10"&gt;#swib10&lt;/a&gt;) conference in Cologne, Germany, followed by &lt;a href="http://www.online-information.co.uk/index.html"&gt;Online Information 2010&lt;/a&gt; in London ( &lt;a href="http://twitter.com/search?q=%23online2010"&gt;#online2010&lt;/a&gt;). These are some thoughts from those events.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;SWIB&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I saw two examples of uses of FRBR that do not follow the structure provided in the FRBR documentation and both made good sense to me.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The Bibliotheque Nationale of France (BNF) is working to export its data in a linked data format. They are linking the Manifestation directly to the Work and to the Expression, rather than following the M -&gt; E -&gt; W order that is defined in FRBR. I need to think about this some more, but it seems to remove some of the rigidity of the linear WEMI.&lt;/li&gt;&lt;li&gt;The Deutsche Nationalbibliothek is using an identifier method that seems to resolve the (long) discussion I instigated on the FRBR list about identifying WEMI with a single identifier. They give an identifier to the single WEMI group (one work, one expression, one manifestation, and presumably one Item, but no one seems to be talking about items.) There is also an identifier for each W, E, M, I. This works well for input and output (and sharing). When a matching W or WE is found, a "merged" identifier is coined for the FRBR units. I couldn't follow the presentation, as it was in German, but from the slides it looked to me that all of these identifiers could co-exist, and therefore would represent different views simultaneously of the bibliographic data that would depend on the function in play (e.g. export of data about a book v. support of shared cataloging).&lt;/li&gt;&lt;/ul&gt;The key thing that I learned, though, was that there is a plethora of semantic web activities in libraries in Europe. Among these, the British Library has released the National Bibliography (1956-); the BNF will soon make data available, as will the German National Library. What do these libraries have in common? Among other things, their data is not bound by the OCLC record policy, so they are able to make it freely available.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Online 2010&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I was the opening speaker on a panel about the Semantic Web at this conference and unfortunately that was the only bit of the conference I was able to attend other than the exhibits. Online Info is a combined publisher/library conference, with the publishing side being primary. At the conference one of the three tracks was "Exploiting Open and Linked Data." In the exhibits the term "semantic" was everywhere. I would like to attend this conference (because I can't really say that I have) to get a view of linked data from another industry's perspective.&lt;br /&gt;&lt;br /&gt;My co-speakers were Sarah Barlett of Talis, and Martin Malmsten from the Swedish National Library. Sarah did something that had never occurred to me, but now I just think "Doh!" it's so obvious. Her &lt;a href="http://www.slideshare.net/SarahBartlett/what-place-for-libraries-in-a-linked-data-world"&gt;talk&lt;/a&gt; walked through a literary, rather than bibliographic, view of some library materials. She showed how you could use linked data to support the humanities. It was, as the British say, brilliant. It's also a great way to teach people about linked data, and she advised everyone to come up with something they have a passion for and use it as an exercise in linking. Now I want to come up with some fun linking exercises for teaching purposes.&lt;br /&gt;&lt;br /&gt;Martin talked about the motivation for making &lt;a href="http://libris.kb.se/?language=en"&gt;LIBRIS&lt;/a&gt;, the Swedish union catalog, open as linked data since 2008. He and I agreed that we really need a good linked data app that would allow people to explore the linked data space. He quoted Corey Harper saying that the killer app for linked data will probably be created by a 13-year-old, someone for whom the idea of open linking is neither novel nor new. I am really interested to see what the "linked open data" generation comes up with!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-3725208079101179016?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/3725208079101179016/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=3725208079101179016' title='21 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3725208079101179016'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3725208079101179016'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/12/online-2010-and-swib.html' title='Online 2010 and SWIB'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>21</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-5667368861291667863</id><published>2010-12-06T03:50:00.000-08:00</published><updated>2010-12-06T04:10:41.702-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='linked data'/><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>Response to JPW</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_Z1EA7hov2P0/TPzQ72X2RpI/AAAAAAAAA2c/MnwNo3-vNsA/s1600/IMG_0047.jpg"&gt;&lt;img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 299px; height: 400px;" src="http://4.bp.blogspot.com/_Z1EA7hov2P0/TPzQ72X2RpI/AAAAAAAAA2c/MnwNo3-vNsA/s400/IMG_0047.jpg" alt="" id="BLOGGER_PHOTO_ID_5547538567640008338" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Note: John Price Wilkin of Michigan wrote a &lt;/span&gt;&lt;a style="font-style: italic;" href="http://blog.okfn.org/2010/11/29/open-bibliographic-data-how-should-the-ecosystem-work/"&gt;post&lt;/a&gt;&lt;span style="font-style: italic;"&gt; on the Open Knowledge Foundation blog that is very critical of the library linked data movement and the creation of numerous disjoint files of bib data in linked data formats. I admit that it isn't clear to me what he thinks should happen, but it seems to be something like this photo, which I took at the Online 2010 exhibit hall. This is OCLC's booth. &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;A separate cloud for libraries. Totally the wrong idea.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I must say that I see things quite differently from JPW. Although I agree that a bunch of static bibliographic files do not open library linked data make, my view is:&lt;br /&gt;&lt;br /&gt;1) Each file represents a person or group who got interested in transforming library data and went through the learning process of actually doing it. Therefore each file is a contribution to our collective knowledge about linked data. When we add these files to heterogeneous stores like Open Library or Freebase, we exercise that knowledge.&lt;br /&gt;&lt;br /&gt;2) These files are the fodder for further experimentation with mixing library data and non-library data, which to me is one of the main points of linked library data. We are in the "training wheels" stage of this change, and like training wheels these early files may end up being discarded when we finally learn to ride. I see no harm in that.&lt;br /&gt;&lt;br /&gt;3) This experimentation is taking place primarily outside of the US in places where the OCLC record use policy does not apply. The British Library, the National Library of Sweden, soon the Bibliotheque Nationale, and a handful of German libraries are at the forefront of this. If you cannot release your bibliographic data openly, you cannot participate in the linked data movement.&lt;br /&gt;&lt;br /&gt;4) I do think that we will have library systems that make use of a different data format to the one we have today, but those are not the same as linked data, and are definitely not the linked open data that is the main focus of the linked data activity. How we manage our data for ourselves may well be different from how we share it with the world. We do need a well-ordered library data universe where we do our bibliographic work. That should exist in parallel with open sharing that reaches beyond the library cataloging community.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-5667368861291667863?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/5667368861291667863/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=5667368861291667863' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5667368861291667863'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5667368861291667863'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/12/response-to-jpw.html' title='Response to JPW'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_Z1EA7hov2P0/TPzQ72X2RpI/AAAAAAAAA2c/MnwNo3-vNsA/s72-c/IMG_0047.jpg' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-2321594524256213713</id><published>2010-10-29T09:26:00.000-07:00</published><updated>2010-10-29T09:41:26.773-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='skyriver'/><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>SkyRiver/OCLC suit moved to Ohio court</title><content type='html'>The judge in San Francisco's Ninth Circuit court has agreed to OCLC's request to transfer the proceedings in the SkyRiver/OCLC suit to the Southern District Court of Ohio. In an impressively thoughtful &lt;a href="http://docs.justia.com/cases/federal/district-courts/california/candce/3:2010cv03305/230152/26/"&gt;10 page document&lt;/a&gt;, the judge weighs the various arguments by the parties relating to the request to transfer. In the end, the decision was based on two things:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;A majority of the potential witnesses that are neither SkyRiver nor OCLC employees (e.g. libraries that can give evidence) are closer to Ohio than to California.&lt;/li&gt;&lt;li&gt;In terms of documentation as evidence, most of this documentation will need to come out of OCLC's file cabinets, since the suit refers to OCLC business practices over a significant period of time.&lt;/li&gt;&lt;/ol&gt;I was hoping to be able to sit in on some of the action in the San Francisco court, although more experienced folks have told me that it could be deadly dull. Now we need to find possible bloggers in the Ohio area to cover this. Any volunteers?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-2321594524256213713?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/2321594524256213713/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=2321594524256213713' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/2321594524256213713'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/2321594524256213713'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/10/skyriveroclc-suit-moved-to-ohio-court.html' title='SkyRiver/OCLC suit moved to Ohio court'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-1590838364221931848</id><published>2010-10-12T20:25:00.000-07:00</published><updated>2010-10-12T21:22:17.891-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MARC'/><title type='text'>Beyond MARC-up</title><content type='html'>In the recent Code4lib journal, Jason Thomale has published an article &lt;a href="http://journal.code4lib.org/articles/3832"&gt;"Interpreting MARC: Where’s the Bibliographic Data?" &lt;/a&gt;in which he struggles to find the separate logical elements in a MARC 245 field. I must admit that I'm not entirely clear on what he means by 'bibliographic data' but I empathize with his attempts to find the data in MARC. In his conclusion he says:&lt;br /&gt;&lt;blockquote&gt;... MARC has as much in common with a textual markup language (such as SGML  or HTML) as it does with what we might consider to be “structured  data.”&lt;/blockquote&gt;I have myself often referred to MARC as a markup language, to distinguish it from what a computer scientist would call "data." We took the catalog card and marked it up so that we could store the text in a machine-readable form and re-create the card format as precisely as possible. Along the way, a few fields (publication date, language, format) were considered in need of being expressed as actual data, and so the fixed fields were designed to hold those. Oddly enough, though, in most cases the same information was available in the text, meaning that the information had to be entered twice: once as text, and once as data.&lt;br /&gt;&lt;blockquote&gt;008 pos. 07/10 = 1984&lt;br /&gt;260 $c 1984&lt;/blockquote&gt;This fact is proof that at one point the MARC developers were fully aware that the text in the variable fields was ill-suited to machine operations other than printing on a card (or display on a screen).&lt;br /&gt;&lt;br /&gt;I have been working off and on for a number of years on an analysis of MARC that is perhaps similar to Thomale's search for the bibliographic data of MARC. I characterize my project as an attempt to define the data elements of the MARC record. The logic goes like this: if we want to create a new, more flexible format for library data, one way to begin that process is to break MARC data up into its data elements. These can then be re-combined using a new data carrier.  The converse is that if we cannot break MARC up into its data elements, then any new carrier will surely be saddled with some of the problematic aspects of MARC, such as:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;redundancy, especially the repeat of the same content in many different fields&lt;/li&gt;&lt;li&gt;inconsistency, where the content in those different fields is coded differently or with a different level of granularity&lt;/li&gt;&lt;li&gt;potential contradiction between data in fixed fields and textual data&lt;/li&gt;&lt;/ul&gt;I am still just in the beginnings of my analysis, but for anyone who wants to follow along and comment/cajole/criticize, I am doing my thinking out loud on the &lt;a href="http://futurelib.pbworks.com/MARC-elements"&gt;futurelib wiki&lt;/a&gt;. I thought I would start with the 0XX fields, but decided to drop back and start with 007/008. I have a database of all of the 007/008 elements and their values, (linked in tab-delimited format on&lt;a href="http://futurelib.pbworks.com/w/page/Data+and+Studies"&gt; this wiki page&lt;/a&gt;) so I've been able to sort and eliminate and do other database-y things that help me see what's there.&lt;br /&gt;&lt;br /&gt;I'm not interested in replicating MARC, so I do not want to create something that is one-to-one with MARC fields and subfields. As an example, some fixed field data elements and their values appear more than once in the MARC format, such as the 008 "Government publication" element which is identical in the 008 for books, computer files, maps, continuing resources and visual materials. As far as I'm concerned it is a single data element. On the other hand, an element named "Color" appears in more than one 007 field, but in each case the values that are valid for the data element are different. These then are different data elements.&lt;br /&gt;&lt;br /&gt;I am struggling with how to create usable output from my investigations. I may code some things in the&lt;a href="http://metadataregistry.org"&gt; Open Metadata Registry&lt;/a&gt;, but at the moment that would have to be done by hand and I need something more automated. I would like to represent the controlled lists in the fixed fields in an RDF-compatible way using SKOS. This should be relatively simple once certain decisions are made (naming, URIs, etc.).&lt;br /&gt;&lt;br /&gt;A big question is how to link all of this back to MARC. For the fixed fields it's relatively easy to create a string that represents the MARC origins of the data, for example:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;007microform05 to represent the data element (field 007, category of material Microform, position 05)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;007microform05f to represent the actual value (field 007, category of material Microform, position 05, value=f)&lt;/li&gt;&lt;/ul&gt;When it comes to the variable fields this is going to be more difficult because, as Thomale points out in his article, a logical element may span more than one field/subfield, and there may also be multiple elements in a single subfield. Working that out is going to be very, very difficult. So it seems best to go for the low-hanging fruit of the fixed fields.&lt;br /&gt;&lt;br /&gt;Note that there have been other&lt;a href="http://marccodes.heroku.com/"&gt; good starts&lt;/a&gt; at defining the MARC fixed fields in SKOS, and eventually we may be able to bring this all together. Meanwhile, I did grab marc21.info for the URI portion of this work and obviously am working toward dereferenceable URIs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-1590838364221931848?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/1590838364221931848/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=1590838364221931848' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1590838364221931848'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1590838364221931848'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/10/beyond-marc-up.html' title='Beyond MARC-up'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-4254180455035891323</id><published>2010-09-10T14:29:00.000-07:00</published><updated>2010-09-10T16:17:06.515-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='FOAF'/><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><title type='text'>Libraries, FOAF, and community</title><content type='html'>&lt;span style="font-style: italic;"&gt;Note: this is being posted simultaneously on two blogs: &lt;/span&gt;&lt;a style="font-style: italic;" href="http://managemetadata.org/blog/"&gt;Metadata Matters&lt;/a&gt;&lt;span style="font-style: italic;"&gt; and &lt;/span&gt;&lt;a style="font-style: italic;" href="http://kcoyle.blogspot.com/"&gt;Coyle's InFormation&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;“Why don’t libraries just use FOAF for their Person metadata? Why do they insist on creating their own?”&lt;br /&gt;&lt;br /&gt;We don’t know how many times we have heard this on various lists. It often is not really posed as a question; in other words, it isn’t asking for an explanation of why libraries do not choose to use FOAF. It’s more rhetorical, along the lines of “Why can’t we all just get along?” But it is worthy of being asked as a real question, and of getting a real answer.&lt;br /&gt;&lt;br /&gt;[Note first that the question of FOAF comes up not so much as we consider the current library standards, but in discussions of upcoming standards that will hopefully be based on the FR** family of standards (FRBR, FRAD, FRSAR). ]&lt;br /&gt;&lt;br /&gt;A comparison of FOAF Person and the library Person entity (either in MARC authority files, or RDA, or FRAD) shows that there is not one defined element (or “property” as it is called in Semantic Web-ese) that the two have in common. This is not a coincidence; the two vocabularies serve significantly different communities and purposes. This does not mean that they are irreconcilable; the question therefore becomes: What keeps them apart? and can that be overcome?&lt;br /&gt;&lt;br /&gt;The key is in the nature of the two communities.&lt;br /&gt;&lt;br /&gt;FOAF stands for ‘Friend of a Friend’, which is a clue to its context: the schema is primarily for use in social networking situations.  Its focus is on people who are alive and online, and it includes online contact information like email addresses, web sites, work web sites, Facebook IDs, Skype IDs, etc. The name of the person in FOAF is not an identifier, but presumes that the name of the person plus one or more of the contact IDs is enough to distinguish most humans from one another.&lt;br /&gt;&lt;br /&gt;Library name data (which is a form of controlled vocabulary, called “name authority data” in library terms) is focused on creating a unique identifier that brings together the different forms of a name used in published materials under one form. Library users, therefore, can expect to find all of the works by or about a named person under a single entry regardless of the various forms of the name that exist in real data. Uniqueness of names is enforced by adding information to a non-unique name, usually the year of birth, but when that isn’t known (especially for persons of antiquity) titles or even areas of endeavor (“poet”) can be added.&lt;br /&gt;&lt;br /&gt;To accommodate both the FOAF (social) function and the libraries’ identification function, at the very least the libraries would need to define a sub-property of FOAF Person, one that has a more strict definition and usage. However, for the library “Person” to be designated as more specific than FOAF:Person does not require that these two be in the same vocabulary. That is one of the important features of Semantic Web properties: like any other resource, they can be linked and related to any other resources on the Web.&lt;br /&gt;&lt;br /&gt;Why not combine the library and FOAF properties into a single metadata vocabulary? The answer has little to do with technology, but instead relates to the functioning of communities. Metadata standards need to be developed by (and for) actual communities. The FOAF and library communities clearly have different needs, different goals, and are working with fundamentally different use cases. They also are significantly different as communities.&lt;br /&gt;&lt;br /&gt;FOAF is being developed by an informal group of developers, and is quite recent in origin. The group is small: the FOAF development email list has about 350 members. Another 350 individuals are listed on the FOAF wiki pages as having a FOAF profile available on the Web. This is obviously not the full extent of FOAF usage, but these numbers reflect the recent development of this kind of metadata.&lt;br /&gt;&lt;br /&gt;The library community has hundreds of years of investment in the creation of metadata (even though it was not called that when libraries began to create it). There are at least tens of thousands of libraries in the world, many of which have been in existence for centuries. Library data has its origins in early 19th century book catalogs but has been created in a machine-readable format since the late 1960’s. Library data is created following formal rules governed in part by international agreements, and there are many hundreds of millions of machine-readable bibliographic records in existence that were created based on these library cataloging principles.&lt;br /&gt;&lt;br /&gt;Libraries have engaged in wide-spread data sharing for centuries, and with the global networking capabilities of today libraries are actually able to exchange and re-use data on a huge scale.  Libraries do not each create metadata for the same book or item, but instead share the metadata created by one library in cooperative efforts oriented towards resource sharing and efficiency.&lt;br /&gt;&lt;br /&gt;This sharing is built into the very core of library data management. The ability to use data created by others is supported by standards and those standards form the basis for the library systems. While most users see only the library catalog available to the public, that is only one function of a system that supports purchasing, fund accounting, inventory control, circulation and patron management, and collection analysis. In the Western world these systems are not created and maintained by libraries but by a small number of specialized commercial vendors whose products are specifically created for the library customers using agreed library standards. Thus the very same system can be sold to hundreds or thousands of libraries, creating a viable market base for system development.&lt;br /&gt;&lt;br /&gt;A number of the 70,000 libraries contributing to OCLC are using a single standard, MARC21, and others are following international standards such as ISBD that produces standardized bibliographic description. The development of these standards is based on a large scale community process with international participation. It is not a perfect process by any means, and clearly must be updated to meet modern needs and new technologies that have changed the way we work, but the degree of data sharing libraries depend on requires that a formal process be in place to support the standards of this community.&lt;br /&gt;&lt;br /&gt;Sharing of data on a large scale is necessitated by the economic reality of the library sector. Libraries face increasingly shrinking budgets while coping with an upswing in demand for their services. Realistically, this means that changes to library data must be carefully coordinated in order to minimize disruption to the complex network of data sharing that makes cost-effective library services management, based on this data, possible. Libraries may appear to be mistrustful of change agents, and in some cases they certainly are, but there is a real need to minimize risk for the community as a whole in order to assure the health of these often financially fragile institutions.&lt;br /&gt;&lt;br /&gt;So we come back to the question of libraries and FOAF. In the final analysis, we’re not at all sure that there’s much gain in trying to combine these two approaches, with the differences in their communities and functions.  It could be like trying to combine oil and water, requiring compromises that in the end would be less than satisfactory for both communities. One could argue that the difference between the vocabularies and their contexts is a positive, allowing more than one view of the Person entity. As two separately maintained metadata vocabularies, anyone creating metadata can choose from either as needed without sacrificing precision. One can also imagine other views that will arise, such as Persons in medical data or financial data, which would each carry data elements that are neither in FOAF nor library data, from blood type to bank balance. The important thing is to make sure that these vocabularies are properly described and related to each other where possible. That way, each community can manage its own process based on its needs for standards integration, but data can be shared where appropriate.&lt;br /&gt;&lt;br /&gt;We could begin with a more detailed discussion between the FOAF and the library communities about their metadata needs. With hundreds of years of experience in representing names in library catalogs, we feel confident that the library community’s knowledge could contribute in general to the use of personal names in the Semantic Web.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-4254180455035891323?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/4254180455035891323/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=4254180455035891323' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/4254180455035891323'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/4254180455035891323'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/09/libraries-foaf-and-community.html' title='Libraries, FOAF, and community'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-1797001717390580499</id><published>2010-08-17T14:35:00.001-07:00</published><updated>2010-08-19T13:07:46.970-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='skyriver'/><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>OCLC, SkyRiver, and the slow arm of the law</title><content type='html'>I suppose one could be gratified to learn that there are institutions that move at least as slowly as libraries, but I'm not happy about the delayed gratification that entails, nor the fact that it means that will we have to try to move forward as a community without having answers for quite a while.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://dockets.justia.com/docket/california/candce/3:2010cv03305/230152/"&gt;recent documents&lt;/a&gt; that have been filed with the court in the SkyRiver/OCLC case have the following actions and dates in them:&lt;br /&gt;&lt;br /&gt;First, OCLC will request that the suit be moved from Northern California to the Southern District of Ohio. Just to cover that motion will take us through &lt;span style="font-weight: bold;"&gt;October, 2010&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;If that does not derail the current calendar (and I presume it could cause this date to be moved back), then the Case Management Conference will be on &lt;span style="font-weight: bold;"&gt;January 14, 2011&lt;/span&gt; in the San Francisco courtroom.&lt;br /&gt;&lt;br /&gt;No, I have no idea what a "case management conference" is but it sounds like something preliminary. I would love it if someone with a legal background could offer some occasional commentary on what some of these steps mean. Right now I presume that all of this is par for the course for lawsuits of this nature, but never having observed such a case before, I really have no idea. Anyone know some law librarians who can chime in?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-1797001717390580499?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/1797001717390580499/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=1797001717390580499' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1797001717390580499'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1797001717390580499'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/08/oclc-skyriver-and-slow-arm-of-law.html' title='OCLC, SkyRiver, and the slow arm of the law'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-7048869834228603285</id><published>2010-07-30T10:30:00.000-07:00</published><updated>2010-08-01T07:03:15.593-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>SkyRiver/III v. OCLC, Part II</title><content type='html'>In my &lt;a href="http://kcoyle.blogspot.com/2010/07/skyriveriii-v-oclc-lawsuit.html"&gt;previous post&lt;/a&gt; I covered what I saw as the stronger arguments made in the complaint. In this post I will cover points that either puzzled me or seemed to be off the mark.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;The OCLC Number&lt;/span&gt;&lt;br /&gt;The complaint states that &lt;blockquote&gt;"This OCLC number has permitted OCLC to police its members to ensure that their records are not shared with unauthorized users." (p. 5) &lt;/blockquote&gt;Since anyone can add or delete an OCLC number from a MARC record in their own database, I don't see how this could be the case. I would like to see how this claim is supported.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;The ILS Market&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;"OCLC is rapidly gaining market share in the ILS market by leveraging its monopoly power over its bibliographic database... " (p. 6) &lt;/blockquote&gt;Can they supply the figures to support this rapid gain in market share? They do state the number of WorldCat Local installations ("624", p. 22), but WCL is not an ILS (even though it may eventually become the basis for one).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Academic Libraries only&lt;/span&gt;&lt;br /&gt;The complaint appears to only address academic libraries.  (p.7) This could be because the evidence that they claim to have only relates to academic libraries, but both OCLC and III serve many public libraries. The complaint also states that:&lt;br /&gt;&lt;blockquote&gt;"The relevant geographic market ... is the United States, because academic libraries cannot turn to suppliers of these products in other countries to meet their needs." (p. 10)&lt;/blockquote&gt;This may just be poorly worded, but if it intends to mean that there are no extra-US companies providing the service then it should have said so. The way it is worded it sounds like there are prohibitions on using non-US suppliers that pertain to academic libraries... could that be so?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;New Products&lt;/span&gt;&lt;br /&gt;In numerous places in the document, the complaint states that OCLC members are &lt;span style="font-style: italic;"&gt;required&lt;/span&gt; to participate in product development as part of their membership obligation:&lt;br /&gt;&lt;blockquote&gt;"Membership also obligates libraries to assist OCLC in developing new products and services to compete with for-profit firms." (p. 5)&lt;br /&gt;&lt;br /&gt;"OCLC developed, and is still developing, WorldCat Local and WorldCat Local "quick start" through pilot programs in which many of its member university libraries have agreed to participate, without compensation, purportedly to meet the requirements of their membership in OCLC." (p. 20)&lt;/blockquote&gt;I have never heard of this requirement, and would be interested in hearing from institutions who did find themselves essentially forced to participate in pilots as part of their membership.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Acquisition of Other Companies&lt;/span&gt;&lt;br /&gt;The complaint states that over time OCLC has expanded by acquiring 19 library industry companies, 14 of which were for-profit. (They fail to mention that at least some of those companies magically became non-profit when acquired by OCLC, cf. netLibrary.) The remainder of the sentence reads:&lt;br /&gt;&lt;blockquote&gt;"... either to obtain software and other products that enable it to offer library services in competition with the remaining for-profit providers or simply to eliminate products from the marketplace." (p. 23)&lt;/blockquote&gt;These are strong words that the complainants should be prepared to prove. I'm not saying that it isn't true. However, in the few cases of which I am aware (WLN, netLibrary, RLG) the acquired company was in financial free-fall and OCLC's purchase was viewed at the time as a rescue that benefited the library community as a whole. In the case of netLibrary, OCLC had agreed to be the escrow agent for the ebooks purchased by libraries, to be called upon should netLibrary go out of business. In that case, OCLC was pretty much pre-obligated to rescue netLibrary or provide some service of its own. (I don't know what the monetary arrangements of the escrow were.) As for WLN and RLG, it's hard to know what would have happened if OCLC hadn't purchased those agencies. I suspect that the libraries using those services would have had to become OCLC members in any case in order to continue functioning as libraries. This only covers three of the 19, and may or may not be representative of OCLC's acquisitions.&lt;br /&gt;&lt;br /&gt;[Partial list of acquisitions, gleaned from press releases and annual report:&lt;br /&gt;Dewey Decimal System (1988), Information Dimensions (1993) [sold in 1997], Public Affairs Information Service/PAIS(1999), WLN (1998), netLibrary (2002, with MetaText eTextbook Division, a for-profit subsidiary), Openly Informatics (2006, OpenURL services), RLG (2006), EZproxy (2008), Amlib (2008, Australian web-based ILS), PICA (1997), Fretwell-Downing (2005), Sisis Information Systems (2005). &lt;span style="font-style: italic;"&gt;Note&lt;/span&gt;: these may not be the same companies referred to in the complaint. This is my cobbled together list, and should only be seen as such.]&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Head-hunting&lt;/span&gt;&lt;br /&gt;Another strange statement is about OCLC's use of head-hunters to hire staff away from other companies:&lt;br /&gt;&lt;blockquote&gt;"In addition to acquiring for-profit companies, OCLC also uses headhunters to identify and recruit employees from for-profit firms. Plaintiffs are informed and believe and based thereon allege that OCLC is using its tax-free dollars to recruit employees of for-profit vendors of library services to eliminate competition and extend OCLC's monopoly to the ILS market." (p. 26)&lt;/blockquote&gt;There's obviously a story here, but I don't know what it is. Using headhunters is standard industry practice for a well-heeled high-tech organization. Has OCLC engaged in predatory hiring behavior? And can that allegation be proved?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Access to WorldCat&lt;/span&gt;&lt;br /&gt;The strangest thing in this complaint is the repeated insistence that OCLC should give access to the WorldCat database to potential competitors.&lt;br /&gt;&lt;blockquote&gt;"...As a result of OCLC's conduct... Innovative [and SkyRiver, in another paragraph] has suffered and will continue to suffer irreparable harm ... unless this Court orders defendant OCLC to provide access to the WorldCat database to Innovative and other competitors, on such terms as are just and reasonable." (p. 31; same but ref. to SkyRiver p. 29)&lt;/blockquote&gt;This argument comes as a surprise to me. I had always assumed that the goal was to allow libraries to provide their bibliographic records freely to anyone they wished, including for-profit companies. I see that as very different from giving competitors direct access to WorldCat. It seems to me that the former goal would be very easy to argue, but direct access to OCLC's own database seems much more difficult to justify. I'm quite puzzled by this, unless I am drawing the wrong conclusion about what it means.&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;&lt;br /&gt;There's a part of me that wants this to go to court so that we can get answers to these intriguing questions. There's another part of me that sees the  possiblity that this could be a lose-lose proposition. Given the overall stress in the library community, both monetary and technological, in-fighting looks to be the worst thing we could do to ourselves.&lt;br /&gt;&lt;br /&gt;There is no doubt that a large, union catalog of library holdings is key to providing the kind of web-scale (sorry, but I couldn't think of another word) services that libraries absolutely &lt;span style="font-style: italic;"&gt;must&lt;/span&gt; provide today. That said, that database does not have to be WorldCat, although WorldCat performs that function at this moment in time. The main thing is that we must have a union/universal catalog that serves libraries and their users. It shouldn't be a limited access asset that is being fought over for market share. I don't have a solution to offer, but it's clear to me that the solution is: FREE THE DATA.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-7048869834228603285?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/7048869834228603285/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=7048869834228603285' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7048869834228603285'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7048869834228603285'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/07/skyriveriii-v-oclc-part-ii.html' title='SkyRiver/III v. OCLC, Part II'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-8845118971821294615</id><published>2010-07-30T06:34:00.000-07:00</published><updated>2010-08-01T06:43:18.520-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>SkyRiver/III v. OCLC: the lawsuit</title><content type='html'>I have now had a chance to read the legal complaint that SkyRiver/III have filed against OCLC. Marshall Breeding does a good overview of the complaint in a &lt;a href="http://www.libraryjournal.com/lj/home/886099-264/skyriver_and_innovative_interfaces_file.html.csp"&gt;Library Journal piece&lt;/a&gt;. I'm going to focus on highlights and lowlights, what I think works and what I think doesn't. The caveat is that I do not know enough about anti-trust law to understand whether the suit is convincing on that score. So what follows is my reading of the complaint today, and I welcome corrections, other views, and any commentary.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Smoking Guns&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The complaint has what I see as two smoking guns:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;the use of differential pricing to specifically prevent OCLC members from becoming SkyRiver customers&lt;/li&gt;&lt;li&gt;the claim that OCLC paid cash "inducements" to university officials and paid for "luxury trips to expensive resorts to obtain their commitments to promote OCLC products..." (p. 21)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Both of these are extremely damaging to OCLC if they are true. The latter is possibly not illegal on OCLC's part, although it may have been illegal on the part of the officials who accepted such favors in exchange for a contract with OCLC. This, however, should come to the attention of OCLC's members, who, if this is proven to be true, will undoubtedly find this activity unacceptable for their organization.&lt;br /&gt;&lt;br /&gt;The arguments about differential pricing are less sensational but could be equally damaging. Differential pricing is a normal practice in business, often based on concrete aspects like volume of trade or length of contract. Whether or not it is normal for a non-profit I don't know. Member libraries have accepted that each one forges a contract with OCLC which is considered confidential (although I suspect that librarians discuss with each other informally about what they pay to OCLC). SkyRiver/III claims to have proof that OCLC has used this differential pricing to punish libraries that have moved their cataloging activity from OCLC to SkyRiver. (The &lt;a href="http://kcoyle.blogspot.com/2010/02/yet-more-oclc.html"&gt;MSU case&lt;/a&gt;, as one example.) They also claim to have proof that OCLC lowered cataloging charges for some libraries that were intending to move to SkyRiver, and thus kept them as customers. (See pp. 14-19) This alone may not be illegal, but in this complaint it is described as an unfair use of OCLC's current monopoly position on cataloging services.&lt;br /&gt;&lt;br /&gt;[Note: There appear to be more libraries that batch load their records into OCLC than ones that catalog on OCLC. In the 2008/2009 annual report, OCLC states that it has 11,810 member libraries, and 72,035 participating libraries. (I'm not sure of the difference.) In that same time frame, "the number of items cataloged by batch loading increased to 241.8 million, up from 212.1 the previous year...."  They also state (p.2) that the total of cataloged records plus batch loaded records was 278.3, meaning that batch loading accounted for 87% of the records added to OCLC that fiscal year.]&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Solid Arguments&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The complaint has a number of solid arguments about OCLC's behavior that may be significant should this go to court. Briefly, these are:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;OCLC does not act like a non-profit or a cooperative&lt;/span&gt;. Throughout the document the complaint uses terms like "purported member-based cooperative" when referring to OCLC. In particular, it says:&lt;br /&gt;&lt;blockquote&gt;"Plaintiffs are informed and believe and based thereon allege that OCLC is not a true cooperative in that its members do not share its revenues or control its management, operations or policies. A majority of its Board of Trustees is elected by the Board itself. ... Rather than operating with transparency as a cooperative would be expected to do, OCLC charges different prices to its members for the same services and conceals those differences from its members." p. 5&lt;/blockquote&gt;&lt;br /&gt;The complaint also speaks to OCLC's revenue:&lt;br /&gt;&lt;blockquote&gt;"An insignificant percentage of OCLC's revenues come from membership, grants or charitable contributions." (p. 26)&lt;/blockquote&gt;This is followed by a table of revenues, expenses and corporate equity (in 9-digit figures).&lt;br /&gt;&lt;br /&gt;It isn't clear to me that this is a convincing argument. Non-profits are not required to obtain their revenue through contributions, and there are probably many non-profits that receive considerable income from services. Perhaps OCLC's "mix" of revenues is off the normal curve? That's data that would be interesting to see. However, the degree of competitive behavior against for-profit companies does seem to belie the nonprofit status of the organization.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;OCLC competes directly with for-profit companies.&lt;/span&gt; This argument is for a large part about OCLC's entry into the ILS market with its web-based services, but also relates to its inter-library loan (ILL) services, which compete with III's ILL. The main thrust, though, is that OCLC has announced that it will go into direct competition with the primary services of commercial vendors who serve the library market with library systems.  The argument is that as a non-profit OCLC has an unfair advantage because it does not pay the federal taxes that are required of its for-profit competitors. Repeatedly the complaint refers to OCLC's "tax-free profits." (see p. 2, 9, 21)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;OCLC is a monopoly, and is taking advantage of its monopoly position.&lt;/span&gt; I believe that the unfair use of a monopoly position is essential to the anti-trust aspect of this lawsuit. I also believe that this is a point that is hard to prove. To begin with, there is nothing illegal about having a monopoly position in a market if one has acquired that position with normal dealing. And some of the accusations in the complaint may not be anything other than regular business practices, such as providing some services for free (WorldCat Local quickstart, as an example) as a way to induce customers to buy into for-fee services, or to reward customers for their loyalty. The use of pricing to make it financially untenable for its own customers to contract for non-OCLC services is probably the most damaging argument in this area.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;OCLC has used its position to avoid the public procurement process.&lt;/span&gt; As we know, most public institutions have to go through a cumbersome process in order to procure goods and services. This process is designed to make sure that public money is spent fairly and under controlled conditions that are designed to minimize corruption. The complaint claims that OCLC has obtained contracts for WorldCat Local with public institutions without going through that procurement process. (p. 20)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Trustees are also members.&lt;/span&gt; There is a claim of conflict of interest in the fact that high-level employees of OCLC member institutions also sit on OCLC's board. What isn't mentioned here, oddly enough, is that some of those members draw salaries from OCLC (in addition to the salaries received from their institutions -- see any recent IRS 990 form from OCLC, which lists salaried officers). The conflict of interest is that these same individuals may have decision-making roles in their institution for the purchase of library vendor services. "By agreeing to advance the interests and products of OCLC they are effectively excluding competitors." (p. 27) This may be an issue for OCLC, but it seems that it should also be an issue for the institutions that employ these folks.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Coming next&lt;/span&gt;: Some odd claims, and some misses&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-8845118971821294615?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/8845118971821294615/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=8845118971821294615' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8845118971821294615'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8845118971821294615'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/07/skyriveriii-v-oclc-lawsuit.html' title='SkyRiver/III v. OCLC: the lawsuit'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-4646143672122758865</id><published>2010-07-29T09:11:00.000-07:00</published><updated>2010-07-29T11:30:13.477-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>SkyRiver Sues OCLC over Anti-Trust</title><content type='html'>(Full document now &lt;a href="http://www.librarytechnology.org/docs/14917.pdf"&gt;here&lt;/a&gt;! Thanks Marshall Breeding!)&lt;br /&gt;&lt;br /&gt;The newly created competitor to OCLC's cataloging services, SkyRiver, is suing OCLC in federal court in San Francisco. (&lt;a href="http://www.choiceforlibraries.com/docs/SkyRiverNews-20100729-SkyRiverOCLC.pdf"&gt;Press release&lt;/a&gt;, PDF) I have only seen the press release, so until someone figures out how to free up the actual legal document, what we know is:&lt;br /&gt;&lt;br /&gt;SkyRiver is claiming that OCLC is attempting to "monopolize the the markets for cataloging services, interlibrary lending, and bibliographic data, and attempting to monopolize the market for integrated library systems, by anticompetitive and exclusionary practices." The press release refers to OCLC's "tax-free profits," and that OCLC has used those profits to purchase 14 for-profit companies.&lt;br /&gt;&lt;br /&gt;The press release quotes Leslie Straus, President of SkyRiver, as saying:&lt;br /&gt;&lt;blockquote&gt;“In the process OCLC has punished its own members who have tried to seek out lower cost alternatives like SkyRiver.”&lt;/blockquote&gt;Which undoubtedly refers to the Michigan State issue, which I reported on &lt;a href="http://kcoyle.blogspot.com/2010/02/yet-more-oclc.html"&gt;here&lt;/a&gt;. In that case, OCLC appears to charge MSU an unusually large fee for uploading records to WorldCat after MSU began cataloging on SkyRiver instead of OCLC.&lt;br /&gt;&lt;br /&gt;Undoubtedly, a good part of the concern here is over OCLC's plans to provide Web services that comprise the full functionality of an integrated library system (ILS), thus competing with current ILS vendors. You probably know that SkyRiver was started by Jerry Kline, owner of Innovative Interfaces. If OCLC successfully launches a full-service option for libraries, Innovative and other ILS's will suffer. As the representative of a major ILS company explained to me a few years ago, the library market is a zero-sum game: every time one vendor wins, others must lose, because the number of customers is not growing. The library market is a pie that can be divided into any number of slices, but the pie remains the same. This makes the rise of any one company a threat to all. In the commercial marketplace, the vendors compete over functionality and price. With its non-profit status OCLC has a distinct advantage: it doesn't pay federal income tax on the revenues it brings in. That said, given its size and depth of its involvement in day-to-day library operations, it is plausible that even without its non-profit status OCLC would be a formidable competitor for ILS vendors.&lt;br /&gt;&lt;br /&gt;I cannot comment on the charges of anti-trust because the press release does not give enough information. Hopefully we will get more details about this suit in the near future.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-4646143672122758865?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/4646143672122758865/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=4646143672122758865' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/4646143672122758865'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/4646143672122758865'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/07/skyriver-sues-oclc-over-anti-trust.html' title='SkyRiver Sues OCLC over Anti-Trust'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-67213119835007345</id><published>2010-07-04T08:28:00.001-07:00</published><updated>2010-07-04T09:26:29.148-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><category scheme='http://www.blogger.com/atom/ns#' term='linked data'/><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>Catching up: OCLC, GBS, LOD</title><content type='html'>Some short comments on recurring themes:&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;&lt;br /&gt;OCLC Record Use Policy&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;OCLC has finalized its &lt;a href="http://www.oclc.org/worldcat/recorduse/policy/default.htm"&gt;record use policy&lt;/a&gt;. The content is substantially the same as it was in the previous draft, which I &lt;a href="http://kcoyle.blogspot.com/2010/04/oclc-record-use-policy.html"&gt;commented &lt;/a&gt;on. There is one important improvement, however: the text clarifies OCLC's claims to copyright.&lt;br /&gt;&lt;blockquote&gt; While, on behalf of its members, OCLC claims copyright rights in  WorldCat as a compilation, it does not claim copyright ownership of  individual records.&lt;/blockquote&gt;Of course, claiming copyright and actually having the right are not the same thing, especially with databases. Here's what &lt;a href="http://www.bitlaw.com/copyright/database.html"&gt;BitLaw &lt;/a&gt;says:&lt;br /&gt;&lt;blockquote&gt;Databases as Compilations: Databases are generally protected by  copyright law as compilations. Under the Copyright Act, a compilation is  defined as a "collection and assembling of preexisting materials or of  data that are selected in such a way that the resulting work as a whole  constitutes an original work of authorship." &lt;a href="http://www.bitlaw.com/source/17usc/101.html#compilation"&gt;17.  U.S.C. § 101&lt;/a&gt;. &lt;/blockquote&gt;Generally, carefully selected compilations may make the "original work of authorship" cut; I'm not convinced that a union catalog of library holdings does.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Google Books&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We are still waiting to hear from the judge in the Google Books case. (Every time I write that I check to see if it hasn't been released in the last hour.) Meanwhile, GBS continues to function in Internet time. Google has many publishers on board with its partners program, enough that GBS is becoming a serious rival to Amazon. It has even announced that it will begin selling e-books. The opening screen is the exact opposite of the Google Search screen -- it loads up many dozens of book covers and requires significant scrolling to browse to the bottom. Google has added personalization options ("my library") and lets you create multiple "shelves" to organize your materials.&lt;br /&gt;&lt;br /&gt;Google was first sued in 2005. Five years is a very long time where technology is concerned. In 2005 the ebook was considered dead; now with the Kindle and the iPad, ebooks are alive and well and everyone is trying to get into that game. In that time since 2005, Google has pretty much shown the publishing industry that they can benefit from the online presence that Google is providing. The settlement reads like it was written in another era, trying to solve problems that may not really be considered problems today. The only issue remaining is that of orphan works, and if we could do a decent analysis of copyright holdings, I suspect that the number of orphan works would not be all that large.&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;&lt;br /&gt;Linked Library Data&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;At ALA there was a one-day preconference on linked data, and a half day un-conference attended by about 50 people. There are &lt;a href="http://kcoyle.net/lld-ala.html"&gt;notes &lt;/a&gt;from the un-conference, which broke out barcamp-style into 6 groups for discussion.&lt;br /&gt;&lt;br /&gt;The World Wide Web consortium has an incubator group on &lt;a href="http://www.w3.org/2005/Incubator/lld/wiki/Main_Page"&gt;linked library data&lt;/a&gt;. This group is tasked to spend one year figuring out how to jump-start the creation of linked data in the library world.&lt;br /&gt;&lt;br /&gt;There are ongoing efforts at &lt;a href="http://id.loc.gov/"&gt;Library of Congress&lt;/a&gt; to produce vocabularies, and of course the &lt;a href="http://metadataregistry.org/rdabrowse.htm"&gt;RDA vocabularies &lt;/a&gt;are available (and almost finalized). Ross Singer has announced some of the MARC codes are &lt;a href="http://marccodes.heroku.com/"&gt;available &lt;/a&gt; (I presume on his own site). &lt;a href="http://metadataregistry.org/schema/show/id/5.html"&gt;FRBR &lt;/a&gt;is being defined in linked data form by IFLA.&lt;br /&gt;&lt;br /&gt;We've got just about everything but ... linked data. I'm thrilled that things are moving forward, but frustrated that I still can't see usable results. Deep breath; patience.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-67213119835007345?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/67213119835007345/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=67213119835007345' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/67213119835007345'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/67213119835007345'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/07/catching-up-oclc-gbs-lod.html' title='Catching up: OCLC, GBS, LOD'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-3401552491776045257</id><published>2010-05-26T06:36:00.001-07:00</published><updated>2010-05-28T06:30:40.341-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='FRBR'/><title type='text'>FRBR and Sharability</title><content type='html'>One of the possible advantages to using FRBR as a bibliographic model is that it can provide us with sharable bits in the form of the defined entities. I've been working on creating a test set of records to illustrate some linked data concepts, and so I began thinking about how the data would break out into sharable units. It turns out to be... an interesting question.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Work&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let's start with the Work, which I believe many people have high hopes for. I have a book in hand which I will use for this illustration. Because this is a book, there are only a few possible data elements in the Work, and these are:&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-style: italic;"&gt;Title of the work:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Mort&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Preferred title for the work:&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; Mort&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Date of work:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;1987&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Place of origin of the work:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;England, UK&lt;/span&gt;&lt;br /&gt;&lt;/blockquote&gt;As you can see, there isn't a lot of information in the Work entity itself. In many cases, a cataloger will not know the date of the work, and may not know where the work was written, in which case you could have just title, and the entire Work entity would be:&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;blockquote&gt;&lt;span style="font-style: italic;"&gt;Title of the work:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Mort&lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;/blockquote&gt;What is obviously missing here is the name of the author. That, however, is not an attribute of the Work in FRBR, but is an entity of its own, either Person, Corporate Body, or Family. It seems clear that without the name of the creator (where appropriate) the Work isn't terribly useful on its own. So I am going to add that creator from FRBR Group 2:&lt;br /&gt;&lt;blockquote&gt;&lt;span&gt;Work:&lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;br /&gt;Title of the work:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Mort&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;"&gt;Preferred title for the work:&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; Mort&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Date of work:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;1987&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Place of origin of the work:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;England, UK&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Person:&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Author: &lt;/span&gt;&lt;span style="font-weight: bold;"&gt;Terry Pratchett&lt;/span&gt;&lt;/blockquote&gt;OK, now we are getting somewhere. We have an author and a title. This is a "unit" that someone could grab or link to and make use of. They aren't really separable, which is what puzzles me a bit about FRBR. It's not like you could re-use this Work for another book with the same title (and there are others with this same title). It's only the Work by Terry Pratchett that this Work entity can represent. As far as I am concerned, the creator entity and the work entity are inseparable in the description of a work. A creator can be associated with many works, but Work cannot be re-used with different creators. Once the creator(s) of the Work are defined, that relationship is fixed as part of the identity of the Work.&lt;br /&gt;&lt;br /&gt;We could leave Work as it is here, but if you want to include subject headings in your sharing, they need to be included in the shared Work, because subject headings in FRBR are only associated with the Work. Given that, our sharable Work becomes:&lt;br /&gt;&lt;blockquote&gt;&lt;span&gt;Work:&lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;br /&gt;Title of the work:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Mort&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;"&gt;Preferred title for the work:&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; Mort&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Date of work:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;1987&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Place of origin of the work:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;England, UK&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Person:&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Author: &lt;/span&gt;&lt;span style="font-weight: bold;"&gt;Terry Pratchett&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Subject:&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Topic:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Fantasy fiction, English&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Topic: &lt;/span&gt;&lt;span style="font-weight: bold;"&gt;Discworld (Imaginary place) -- Fiction&lt;/span&gt;&lt;br /&gt;&lt;/blockquote&gt;This is the unit that needs to be created so we can share Works.&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;&lt;br /&gt;Expression&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now let's move on to the Expression, the real bugbear of FRBR. For books, Expression has few data elements. In this case we have:&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;blockquote&gt;&lt;span style="font-style: italic;"&gt;Date of expression:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;1987&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Language of expression:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;English&lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/blockquote&gt;All perfectly fine and well, but clearly not something that can stand alone. Similar to Work, this expression is not usable with just any English language work written in 1987 -- it's not sharable in that sense. This Expression must be associated irrevocably with a particular Work, in this case the Work we created above. There will be some link that essentially says:&lt;br /&gt;&lt;blockquote&gt;E:identifier --&gt; expresses --&gt; W:identifier&lt;/blockquote&gt;&lt;span style="font-style: italic;"&gt;Second thought: Expression can also have an important creator/agent role, such as translator, editor, adaptor -- and possibly others related to music that I'm not knowledgeable about -- so it, too, should include those for sharing. In fact, probably all of the Group2 to Group1 relationships need to be included in a sharing situation. So we get:&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt;Expression&lt;/span&gt;&lt;br /&gt;Date of expression:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;1987&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Language of expression:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;French&lt;/span&gt;&lt;span style="font-style: italic; font-weight: bold;"&gt;&lt;br /&gt;&lt;br /&gt;Person&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;"&gt;Translator&lt;/span&gt;:&lt;span style="font-weight: bold;"&gt; J-P Sartre&lt;/span&gt;&lt;/blockquote&gt;The unit of sharing here must be the &lt;span style="font-style: italic;"&gt;expanded &lt;/span&gt;Expression plus the expanded Work (with Group2 and Group3 entities). This illustrates something that has bothered me a bit about the Group1 FRBR entities, which is the dependency inherent in the hierarchy WEMI. WEMI essentially must be created as a single &lt;span style="font-style: italic;"&gt;thing &lt;/span&gt;with multiple parts. This is true even of the Manifestation.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Manifestation&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The Manifestation is seemingly the richest and therefore the most independent of the FRBR Group1 entities, but as we'll see, without the Work and Expression you do not get a useful set of data elements. Here is what we have for our Manifestation:&lt;br /&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;blockquote&gt;&lt;span style="font-style: italic;"&gt;Title proper:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Mort&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Statement of responsibility:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Terry Pratchett&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Title proper of series:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Discworld&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Date of publication:&lt;/span&gt;&lt;span style="font-weight: bold;"&gt; 2001&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Copyright date:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;1987&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Place of publication:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;New York, NY&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Publisher's name:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;HarperTorch&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Extent of text:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;243 pages&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Dimensions:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;17 cm&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Carrier type:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;volume&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Mode of issuance:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;single unit&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Media type:&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;unmediated&lt;/span&gt;&lt;/blockquote&gt;What is lacking here? Well, there's no link to the entity for the author, which would provide an identification of the author and any variant forms of the author's name. There's no language of text, because that's in the Expression. And there are no subject headings, because those are associated with the Work. If this were a translation, there would be no link to the Work in the original title. The Manifestation entity is very readable, but if we are sharing for the purposes of copy cataloging, it has to be bundled with the Work and Expression to be usable.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Our Sharable Units&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So this is what we get as sharable units:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Work + Group 2 (creator) + Group 3 (subject)&lt;/li&gt;&lt;li&gt;Expression + Group2 (creator) + Work + Group 2 (creator) + Group 3 (subject)&lt;/li&gt;&lt;li&gt;Manifestation + Expression + Group2 (creator) + Work + Group 2 (creator) + Group 3 (subject)&lt;/li&gt;&lt;/ol&gt;With these three, it will be possible to build on Works and Expressions as needed, creating new Expressions and Manifestations for a Work. It will also be possible to "grab" a Manifestation and along with it get a full description including subjects and creators.&lt;br /&gt;&lt;br /&gt;Now we just need a system to test this out.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-3401552491776045257?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/3401552491776045257/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=3401552491776045257' title='18 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3401552491776045257'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3401552491776045257'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/05/frbr-and-sharability.html' title='FRBR and Sharability'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>18</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-770894794502823669</id><published>2010-05-03T07:20:00.000-07:00</published><updated>2010-05-03T08:11:06.696-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><title type='text'>Bib data and the Semantic Web</title><content type='html'>I know that I've gone on and on about transforming bibliographic data into a semantic web format. And whenever folks have asked me: "What will it look like?" I haven't had a good response. Now there is something to show you: &lt;a href="http://www.freebase.com/"&gt;Freebase&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Freebase is a database of interlinked semantic web "statements": essentially what are called by the SemWeb types as "triples." The statements come from a variety of open data sources such as Wikipedia, TVDB.com, a science fiction fan database, and Open Library. By placing a user interface over these data they now have a searchable, navigable site that can link books to movies to (theoretically) music to science to... well, anything where linked data is available.&lt;br /&gt;&lt;br /&gt;Their book data isn't as strong as it should be, given that they claim to have imported the Open Library file (I suspect it was only partially imported). When you look at the Freebase entry for &lt;a href="http://www.freebase.com/view/en/emily_dickinson"&gt;Emily Dickinson&lt;/a&gt; you only see two works listed. &lt;a href="http://upstream.openlibrary.org/authors/OL19512A/Emily_Dickinson"&gt;Open library&lt;/a&gt; has 137 Works for Dickinson, and &lt;a href="http://orlabs.oclc.org/identities/lccn-n79-54166"&gt;WorldCat Identities&lt;/a&gt; lists 3, 388. Also, their approach is more "popular" than rigorous. However, there is no reason why this same technique could not be used with "pure" library data, and library catalogs could make use of any of the data in such a database because it is all available through linking and APIs. A database like Freebase essentially serves as a huge pot of available, re-usable information.&lt;br /&gt;&lt;br /&gt;In its current form, Freebase would not be sufficient for library data sharing, although it could provide an interesting testing ground. What we need to work out for libraries is a way to version and source content so that you know who provided each statement and when, and to make it easy to contribute new information or improvements to the information in a sensible and automated way. There is no reason why we could not create a "LibBase" that exists solely of what libraries would consider to be authoritative information; a kind of linked data WorldCat. That data would have to be able to interact with other data on the Web, and by doing so libraries would become discoverable on the Web. It would be logical for projects like Freebase to link to the library data. Library users would have a rich, navigable information base that could help them follow (or even make) connections between library resources -- connections that are much less evident in today's catalogs. Some technical magic would need to occur to allow users to move seamlessly from the whole world to their local library, but I don't think that's going to take rocket science to solve.&lt;br /&gt;&lt;br /&gt;There is a group of interested souls planning to get together on the Friday morning of ALA DC to begin some exploration of how we might make semantic web technology work for libraries. There will be announcements on various lists (I'm guessing NGC4LIB, CODE4LIB, LITA-L and RDA-L, a the very least). If you can get to ALA a little early, please mark that slot on your calendar. It'll be a free-floating, working, barcamp-style meeting, as I understand it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-770894794502823669?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/770894794502823669/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=770894794502823669' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/770894794502823669'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/770894794502823669'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/05/bib-data-and-semantic-web.html' title='Bib data and the Semantic Web'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-7925006791882745512</id><published>2010-04-26T07:48:00.000-07:00</published><updated>2010-04-26T08:15:14.975-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='classification LCSH'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenLibrary'/><title type='text'>Social aspects of subject headings</title><content type='html'>You've probably played the "my favorite subject heading" game when geeking out with librarian friends. Here's some additional fuel in case you've run out of zingers.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://openlibrary.org"&gt;Open Library&lt;/a&gt; takes the LC subject headings and breaks them apart at the subfield level into subjects, persons, places, genres, and times. It also includes some BISAC headings retrieved from Amazon, so the subject list is not "pure." The separate subject entries obtained are similar to, but not the same as, OCLC's &lt;a href="http://www.oclc.org/research/activities/fast/default.htm"&gt;FAST &lt;/a&gt;headings, and look much like some facets that appear in library catalogs.&lt;br /&gt;&lt;br /&gt;The Open Library database currently holds about 24 million records for books (at least partially de-duped). In a recent dump of subjects, the total number of different subjects came out as 1,278,539. Of those, 336,638 were of the "topical" variety, that is either a 650 $a or a 65X $x. The top 25 are as follows:&lt;br /&gt;&lt;br /&gt;825168    History&lt;br /&gt;322928    Biography&lt;br /&gt;212822    Politics and government&lt;br /&gt;206519    Congresses&lt;br /&gt;192968    History and criticism&lt;br /&gt;184183    Fiction&lt;br /&gt;123838    Law and legislation&lt;br /&gt;119333    Bibliography&lt;br /&gt;95555    Juvenile literature&lt;br /&gt;93364    Description and travel&lt;br /&gt;90866    Economic conditions&lt;br /&gt;84787    Criticism and interpretation&lt;br /&gt;74878    Claims&lt;br /&gt;71468    Social life and customs&lt;br /&gt;70926    Social conditions&lt;br /&gt;70563    Catalogs&lt;br /&gt;69205    Private Bills&lt;br /&gt;69191    Private bills&lt;br /&gt;66480    Education&lt;br /&gt;63410    Exhibitions&lt;br /&gt;63301    World War, 1939-1945&lt;br /&gt;60235    Foreign relations&lt;br /&gt;60068    Philosophy&lt;br /&gt;56219    Dictionaries&lt;br /&gt;55460    Study and teaching&lt;br /&gt;&lt;br /&gt;I find it interesting that with the exception of "World War, 1939-1945" these appear to have the function of qualifiers, and I'm thinking that it would be interesting to contrast the $a and $x terms. My guess is that these are $x, but that not all $x are of this nature.&lt;br /&gt;&lt;br /&gt;Of the subfields, 164,342 appear only once in the database. These are a great source of interesting an unusual headings, including "Social aspects of adzes" and "Deer as pets." In fact, the "Social aspects...." tail is so amusing that I have made a&lt;a href="http://kcoyle.net/SocialAspects1.txt"&gt; file&lt;/a&gt; of those with a count of 1.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://kcoyle.net/subjects.txt"&gt;full file&lt;/a&gt; of topical subjects is 8 megabytes, but can probably yield innumerable hours of library cocktail hour amusement.  (text in format "count - tab - subject") I will also look into names, organizations, places and times as subjects.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-7925006791882745512?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/7925006791882745512/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=7925006791882745512' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7925006791882745512'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7925006791882745512'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/04/social-aspects-of-subject-headings.html' title='Social aspects of subject headings'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-8360160141659225415</id><published>2010-04-09T11:24:00.000-07:00</published><updated>2010-04-13T19:40:28.211-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>OCLC record use policy</title><content type='html'>OCLC has issued a &lt;a href="http://community.oclc.org/recorduse/2010/04/worldcat-rights-and-responsibilities-for-the-oclc-cooperative-is-open-for-community-review.html#comments"&gt;new draft&lt;/a&gt; of its record use policy for member comment. As others have remarked, while better worded and seemingly less draconian than the previous policy (the one that was withdrawn) the substance has not changed one iota. There are many things wrong with the policy itself, but the primary problem with it is not the text of the policy but the way that OCLC has chosen to define the problem it is trying to solve. Here are some of the issues I have with the approach:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;1. Pushing the river&lt;/span&gt;&lt;br /&gt;The central issue is that OCLC wants to limit downstream use of bibliographic data that is stored in WorldCat. This simply cannot be done. The same data is also stored in individual library catalogs, some union or consortial catalogs, and in bibliographic software used by many hundreds of thousands of researchers around the world. It also often closely resembles data created outside of OCLC's sphere, such as through publisher and retailer channels. Sharing of this data is absolutely necessary for the furtherance of intellectual pursuits and scientific progress, as well as the market for new and used items. Ironically, the policy would restrict use of the data by OCLC members without restricting its use by the multitude of non-members. It would be unacceptable even if it were workable, which it isn't.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;2. One-sided&lt;/span&gt;&lt;br /&gt;The policy has a section on member rights and responsibilities, &lt;span style="font-style: italic;"&gt;&lt;span style="color: rgb(153, 153, 153);"&gt;but no such section on OCLC's rights and responsibilities.&lt;/span&gt;  (Nope, I was wrong about that. The section does exist, I must have missed it.) &lt;/span&gt;The policy carries the assumption that, if anything, members are the problem, OCLC the solution, and gives no sense of the policy being the result of an agreement &lt;span style="font-style: italic;"&gt;between&lt;/span&gt; the parties. OCLC can make unilateral decisions about record use, such as its agreement with Google, but members must ask permission of OCLC for many uses. There is nothing here that acknowledges that there could be a situation where the interests of a library and the interests of OCLC are in conflict, nor how that would be resolved. All-in-all, it reads as if the purpose of membership were to sustain OCLC (instead of the purpose of OCLC being to support libraries).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;3. Transparency&lt;/span&gt;&lt;br /&gt;OCLC, or one of OCLC's governing groups, will make decisions. Yet there are no criteria given for making these decisions, no timelines, no reporting back to members, no mechanism for feedback. Will members know how "their" WorldCat records are being used? Will they have any choice in the matter? Will there be a way to know what requests for use have come in to OCLC, which ones have been accepted, which turned down? If WorldCat is such a "community good" shouldn't the community at least have this information about the use of that good?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;4. No options&lt;/span&gt;&lt;br /&gt;In most agreements there is some give and take. If you do X, you will get Y. The OCLC record use policy does not give members options. An example of an  option would be: if you do your cataloging on OCLC, ILL will cost you $X; if you do not do your cataloging on OCLC, uploading your records will cost you $Y and ILL will cost you $Z. With clear options, libraries can decide what is best for them in their particular situation. Without clear options libraries have no way to make rational decisions about their participation in OCLC. It's not a religion, it's a business relationship, and it should be treated like one.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;5. Avoids facing the problem&lt;/span&gt;&lt;br /&gt;The problem that OCLC is trying to fix arises, as far as I can tell, because of OCLC's particular mix of costs and expenses. Most of the revenue comes in to OCLC from its cataloging service, so having members choose to catalog elsewhere is the problem. Exhorting members to keep their records in their databases so that others cannot create a large database of bibliographic data is not a solution to this problem. Large bibliographic databases do and will exist. If their existence is a threat to OCLC, then the jig is already up. Rather than stew about what others are doing with bibliographic data, OCLC needs to find a balance of income and revenue that meets the needs of its member libraries, and that might include making some hard decisions about OCLC services.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;6. Ignores market forces&lt;/span&gt;&lt;br /&gt;If someone can do it better, cheaper, more conveniently, why should libraries stick with OCLC as their vendor? For the purchase of materials or library systems or other services, libraries move to new vendors when they see advantages. With the economic downturn there is a scramble by  libraries to cut costs wherever they can. No amount of loyalty to the "collective" can overcome the economic situation libraries find themselves in today. In a sense, OCLC seems to expect the libraries to act irrationally by sticking with the service even if something more economical comes along. Libraries obviously cannot afford to do this.&lt;br /&gt;&lt;br /&gt;I cannot tell what steps OCLC's members can take at this point. The web site points to a community forum where people can post comments, but posting comments on the policy doesn't begin to solve the underlying problems as presented here. If I were a member, I think I would feel like a row boat hitching a ride behind the Titanic, hoping it will get me through the ice floes. Nothing is unsinkable, as we have unfortunately found out in the past.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-8360160141659225415?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/8360160141659225415/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=8360160141659225415' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8360160141659225415'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8360160141659225415'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/04/oclc-record-use-policy.html' title='OCLC record use policy'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-4154636659755414176</id><published>2010-04-07T07:08:00.000-07:00</published><updated>2010-04-07T07:40:55.311-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MARC'/><category scheme='http://www.blogger.com/atom/ns#' term='library catalogs'/><title type='text'>After MARC</title><content type='html'>The report on the &lt;a href="http://www.loc.gov/bibliographic-future/news/lcwg-ontherecord-jan08-final.pdf"&gt;Future of Bibliographic Control&lt;/a&gt; made it clear that the members of that committee felt that it was time to move beyond MARC:&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;blockquote&gt;"The existing Z39.2/MARC “stack” is not an appropriate starting place for a new bibliographic data carrier because of the limitations placed upon it by the formats of the past." p. 24&lt;/blockquote&gt;&lt;br /&gt;The recent report from the RLG/OCLC group &lt;a href="http://www.oclc.org/research/publications/library/2010/2010-06.pdf"&gt;&lt;span style="font-style: italic;"&gt;Implications of MARC Tag Usage on Library Metadata Practices&lt;/span&gt;&lt;/a&gt; comes to a similar conclusion:&lt;br /&gt;&lt;/span&gt;&lt;blockquote&gt;&lt;span style="font-size:100%;"&gt;"&lt;/span&gt;&lt;span style="font-size:100%;"&gt;5. MARC itself is arguably too ambiguous and insufficiently structured to facilitate machine processing and manipulation." p.27&lt;/span&gt;&lt;/blockquote&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;We seem to be reaching a point of consensus in our profession that it is time to move beyond MARC. When faced with that possibility, many librarians will wonder if we have the technical chops to make this transition. I don't have that worry; I am confident that we do. What worries me, however, is the complete lack of leadership for this essential endeavor.&lt;br /&gt;&lt;br /&gt;Where could/should this leadership come from? Library of Congress, the maintenance agency for the current format, and OCLC, the major provider of records to libraries, both have a very strong interest in not facilitating (and perhaps even in preventing) a disruptive change. So far, neither has shown any interest in letting go of MARC. The American Library Association has just invested a large sum of money in the development of a new cataloging code. It has neither the funds nor the technical expertise to take the logical next step and help create the carrier for that data. Yet, a code without a carrier is virtually useless in today's computer-driven networked world. NISO, the official standards body for everything "information" is in the same situation as ALA: it cannot fund a large effort, and it has no technical staff to guide such a project.&lt;br /&gt;&lt;br /&gt;It seems ironic that there have been projects funded recently to develop library-related software based on MARC even though we consider this format to be overdue for replacement. The one effort I'm aware of to obtain funding for the development of a new carrier was rejected on the grounds that it wasn't technically interesting. In fact, the technology of such an effort isn't all that interesting; the effort requires the creation of a social structure that will nurture and maintain our shared data standard (or standards, as the case may be). It requires an ongoing commitment, broad participation, and stability. Above all, however, it requires vision and leadership. Those are the qualities that are hard to come by.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-4154636659755414176?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/4154636659755414176/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=4154636659755414176' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/4154636659755414176'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/4154636659755414176'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/04/after-marc.html' title='After MARC'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-3046694211141737608</id><published>2010-03-05T06:11:00.000-08:00</published><updated>2010-03-07T07:01:44.417-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MARC'/><category scheme='http://www.blogger.com/atom/ns#' term='RDA'/><title type='text'>MARC: from mark-up to data</title><content type='html'>The main reason that I keep pushing the semantic web is not that I think the semantic web is the answer to all of our problems -- but I do think we need to have something to be moving &lt;span style="font-weight: bold; font-style: italic;"&gt;toward&lt;/span&gt; in terms of transforming our data carrier to something both more modern and web-compatible. The semantic web gives us some basic concepts of data design. I'm not sure that the semantic web concepts will hold for data as complex as the library bibliographic record, but there's only one way to find out: do it. That's a huge task, of course.&lt;br /&gt;&lt;br /&gt;The first question to be answered is: What are our data elements? In theory, this should be one of the simpler questions, but it's not. I can create a list of all of the MARC fields, subfields, and fixed field elements (which I have, and they are linked from this page of the &lt;a href="http://futurelib.pbworks.com/Data+and+Studies"&gt;futurelib wiki&lt;/a&gt;), but that doesn't answer the question. Here's why:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt; &lt;span style="font-weight: bold;"&gt;Indicators&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The indicators in the MARC fields are like a wild card in poker -- they can be used to utterly transform the play. Some of the indicators are simple and probably can be dismissed: the non-filing indicators and the indicators that control printing. Some are data elements in themselves: "Existence in NAL collection" is essentially a binary data element. Many further refine the meaning of the field, allowing the field to carry any one of a number of related subelements:&lt;br /&gt;&lt;blockquote&gt;Second - Type of ring          &lt;div class="indicatorvalue"&gt;# - Not applicable&lt;/div&gt;          &lt;div class="indicatorvalue"&gt;0 - Outer ring&lt;/div&gt;          &lt;div class="indicatorvalue"&gt;1 - Exclusion ring&lt;/div&gt;&lt;/blockquote&gt;Others name the source of the term, such as LCSH or MeSH.  It'll take a fair amount of work to figure out what all of these qualifiers mean in terms of actual data elements.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt; &lt;span style="font-weight: bold;"&gt;Redundancy &lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There is non-textual (although not non-string) data in the MARC record, primarily in the fixed fields (00X) but also in some of the number and code fields (0XX). Some of these, actually most of these, are redundant with display information in the body of the record. Should these continue to be separate data elements, or can we remove this redundancy and still have useful user displays? Basically, having the same information entered in two different ways in your data is just begging for trouble and we've all seen fixed field dates and display (260 $c) dates that contradict each other.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Inconsistency&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Primarily due to the constraints of the MARC format, the same information has been coded differently in different fields. A personal author entry in the 100 field uses subfields &lt;span style="font-style: italic;"&gt;abcdejqu&lt;/span&gt;; in the 760 linking entry field, all of that data is entered into subfield &lt;span style="font-style: italic;"&gt;a&lt;/span&gt;. It's the same data element, and by that I mean that the some contents are contained in the concatenation of &lt;span style="font-style: italic;"&gt;abcdejqu &lt;/span&gt;&lt;span&gt;as in&lt;/span&gt;&lt;span style="font-style: italic;"&gt; a&lt;/span&gt;. Bringing together all of these krufty bits into a more rational data definition is something I really long for.&lt;br /&gt;&lt;br /&gt;And of course my favorite... &lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;data buried in text&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So much of our data isn't data, it's text, or it's data buried in text. My favorite example is the ISBN. Everyone knows how important the ISBN is in all kinds of bibliographic linking operations. But there isn't a place in our record for the ISBN as a data element. Instead, there is a subfield that takes the ISBN &lt;span style="font-style: italic;"&gt;as well as other information&lt;/span&gt;.&lt;br /&gt;&lt;blockquote&gt;020  __&lt;strong&gt; |a &lt;/strong&gt;0812976479 (pbk.)&lt;/blockquote&gt;This means that every system that processes MARC records has to have code that separates out the actual ISBN from whatever else might be in the subfield. Other buried information includes things like pagination and size or other extents:&lt;br /&gt;&lt;blockquote&gt;300  __ |a 1 sound disc : |b analog, 33 1/3 rpm, stereo. ; |c 12 in.&lt;br /&gt;&lt;br /&gt;300  __ |a 376 p. ; |c 21 cm.&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;hr width="50%"&gt;&lt;br /&gt;Once this analysis is done (and I do need help, yes, thank you!), it may be possible to compare MARC to the RDA elements and see where we do and don't have a match. I have a &lt;a href="http://kcoyle.net/rda/"&gt;drafty web page&lt;/a&gt; where I am putting the lists I'm creating of RDA elements, but I will try to get it all written up on the &lt;a href="http://futurelib.pbworks.com/"&gt;futurelib wiki&lt;/a&gt; so it's all in one place. I encourage others to grab this data and play with it, or to start doing whatever you think you can do with the registered &lt;a href="http://metadataregistry.org/rdabrowse.htm"&gt;RDA vocabularies.&lt;/a&gt; And please post your results somewhere and let me know so that I can gather it all, probably on the wiki.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-3046694211141737608?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/3046694211141737608/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=3046694211141737608' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3046694211141737608'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3046694211141737608'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/03/marc-from-mark-up-to-data.html' title='MARC: from mark-up to data'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-8323161998277862231</id><published>2010-03-04T15:37:00.000-08:00</published><updated>2010-03-04T17:25:19.591-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>The Letters Keep Coming In</title><content type='html'>Today I received a copy of a letter written by Roman Kochan, Dean and Director of Library Services at the California State University, Long Beach (CSULB). It's the perfect day for this, because today is the national &lt;a href="http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2010/03/04/BAC41CAAM1.DTL&amp;amp;tsp=1"&gt;day of protest&lt;/a&gt; in support of education. This movement has blossomed (exploded?) over the deep cuts the California state legislature has made to the education budget in the state, cuts which are having a devastating effect on the CSU system, with the libraries extremely hard hit.&lt;br /&gt;&lt;br /&gt;The letter is addressed to "Link+™ Member Libraries and ILL Partners." The subject line on Kochan's letter reads: &lt;span style="font-weight: bold;"&gt;Threat to CSULB Library's ILL Participation&lt;/span&gt;. He states that faced with budget cuts, not only this year but foreseeable for many years to come, CSULB decided to move to SkyRiver™ as their cataloging utility, with anticipated significant savings.&lt;br /&gt;&lt;br /&gt;The next three paragraphs are worth quoting in their entirety:&lt;br /&gt;&lt;blockquote&gt;"We notifed OCLC of this decision, while at the same time advising them of the Library's intent to continue membership in OCLC, to continue to make use of OCLC interlibrary loan services, and to contribute records for our current and future acquisitions to OCLC for batch upload. OCLC's charge for batch upload was (until recently) popsted on the OCLC website as 23¢ per record. That is the amount I referred to in my letter to the organization. I have subsequently learned that:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The price schedule for batch downloading [sic, read: uploading] that contained the 23¢ charge has suddenly and mysteriously disappeared from the OCLC website&lt;/li&gt;&lt;li&gt;Another academic library that chose to displace OCLC with SkyRiver reports that OCLC has quoted a revised charge for downloading their records that amounts to about $2.85 per record; it is a charge that they report would effectively (and one might not think coincidentally) offset the savings accrued from their change to SkyRiver.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/blockquote&gt;The irony in all of this is that CSULB will still be able to have up-to-date ILL services using &lt;a href="http://www.iii.com/products/inn_reach.shtml"&gt;INN-Reach&lt;/a&gt; and Link+, the Innovative Interfaces (III) ILL service. It's ironic because SkyRiver was &lt;a href="http://www.libraryjournal.com/article/CA6700415.html"&gt;founded by Jerry Kline&lt;/a&gt;, the owner of III. Link+ is undoubtedly of smaller reach than OCLC's ILL services, but may in the long run grow if more III libraries move to SkyRiver.&lt;br /&gt;&lt;br /&gt;Offsetting the cost of having a library move to another vendor may make some economic sense, but this is a matter that will need to get cleared up before other libraries move to SkyRiver thinking that they'll be able to upload their records to OCLC for $.23. MSU and CSULB were caught be surprise, which is very unfortunate.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-8323161998277862231?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/8323161998277862231/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=8323161998277862231' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8323161998277862231'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8323161998277862231'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/03/letters-keep-coming-in.html' title='The Letters Keep Coming In'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-6193380842065945534</id><published>2010-02-26T05:57:00.000-08:00</published><updated>2010-03-02T09:14:32.432-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>Yet more OCLC</title><content type='html'>I have in hand a letter from Clifford H. Haka, Director of the Michigan State University Libraries, addressed to "ILL Partners" and dated February 24, 2010. The letter is a response to Larry Alford's document in my &lt;a href="http://kcoyle.blogspot.com/2010/02/oclc-again.html"&gt;previous post. &lt;/a&gt;I will try to represent the facts he presents here as accurately as possible, and to distinguish those from my own opinions.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;FACTS (from the letter)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;MSU libraries chose to move their cataloging from OCLC to SkyRiver in a cost saving effort. They expect to save about $80,000 per year. Because MSU uses OCLC for ILL, they intended to pay to have their records loaded into OCLC. The OCLC service charge list gives the price for this service as $0.23 per record.&lt;br /&gt;&lt;br /&gt;However, when MSU requested the upload service, OCLC offered them a price of $54,000 for five months (presumably end of fiscal year?), which would amount to $74,000 per year for 26,000 records, or $2.85 per record. (Some of this would be offset by cataloging credits.)&lt;br /&gt;&lt;br /&gt;MSU has decided that they cannot afford this, and therefore will not be uploading current cataloging into OCLC. Haka says: "While we will continue with OCLC for ILL, I regret that our newer holdings will not be available for others to consult."&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Now My Take&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I find it astonishing that any corporation would choose to punish customers rather than to work to win them back. I also find it astonishing that OCLC is willing to keep current customers through threats and fear. Essentially, MSU is being made an example: if you move your cataloging to a competitor, we'll cut you out of OCLC services. This is a lesson for anyone else thinking of moving to SkyRiver or some other service.&lt;br /&gt;&lt;br /&gt;As Haka points out in his letter, the OCLC database has a huge number of records that were not created through OCLC cataloging services. When the RLIN cataloging service still existed, many libraries that did their cataloging in RLIN uploaded those records to OCLC so that they could use the OCLC ILL service. They paid an amount similar to the $0.23 that Haka quoted from the current price list. This ability to upload (economically, I should add) is directly in support of the stated goal of maintaining WorldCat's value as a union catalog. The more complete the catalog, the more value it has for services like ILL, resource sharing, and collection development. Yet it is OCLC's action that is devaluing WorldCat by deliberately setting an upload price that MSU obviously cannot support economically. This tells me that the real issue is not the "value of WorldCat" but the revenue that OCLC receives from cataloging.&lt;br /&gt;&lt;br /&gt;Business 101 would tell you that the existence of a competitor brings prices down in the sector. If you can't meet your competitor's price, then you can try to keep your customers through a superior product and better services, but for some price will be the main factor. If someone else can provide the same service at a better price, your customers will go there.&lt;br /&gt;&lt;br /&gt;It seems to me, and Haka alludes to this, that OCLC's reliance on cataloging revenue may be in trouble, not just because of SkyRiver but also because of the Internet: it is now very easy for anyone to store and move metadata on the public Internet. The number of sites dedicated to the same materials that one finds in libraries in increasing rapidly. We have Amazon, Google Books, LibraryThing, Open Library, IMDB, and on and on. They all have metadata describing the things in their focus. It's not the same as library metadata, but the library catalog is no longer, and not by any means, an exclusive source of description for books, films, or music.&lt;br /&gt;&lt;br /&gt;What OCLC has that is unique is not just the quantity of metadata but the library holdings information. And they seem to be aware of this as they load in both records and holdings from many libraries that do not do their cataloging on OCLC. OCLC's value is in the whole package, but it still relies on cataloging as its primary revenue (although shrinking as a percentage of the total income, as you can see in their &lt;a href="http://www.oclc.org/news/publications/annualreports/default.htm"&gt;annual reports&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;The services, like ILL, that OCLC provides for libraries are incredibly valuable and it would be a great detriment to the library community to lose them. It does appear, however, that there has been shift in the marketplace; a shift that has nothing to do with library loyalty to the OCLC collective, but one of changing technology and economics. OCLC is trying to push water upriver, when it should be seeking a new balance in its revenue stream. Instead, OCLC is making a real mess of its relationship with its members -- first with the horribly botched record use policy (which isn't going to solve this problem anyway), and now with acting punitively toward members who make the kinds of economic decisions that we all make every day. I believe the "collective" can be saved, but only if OCLC decides to work with, not against, its members.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;More thoughts&lt;/span&gt; (added later)&lt;br /&gt;&lt;br /&gt;I realize now that I have many other questions about record loading on OCLC. For example, many libraries get some of their records from their book vendors, and those do get loaded into OCLC. Is that charged as cataloging, or as record loading? Are there different fees for loading records if you are doing your cataloging on OCLC vs. if you are not? Are there "load only" libraries who load their records in order to participate in ILL and other services? If so, what are they charged for record loading?&lt;br /&gt;&lt;br /&gt;I say this because it makes sense to me that libraries that do not do their cataloging on OCLC would be encouraged to load their records so that they can participate in other services. It also makes sense that the price for this would be commensurate with that of adding your holdings online (or maybe a bit cheaper if it's more economical for OCLC to batch load rather than provide cataloging online). In fact, what difference does it make how you get your records into OCLC? The most important thing is that your records are there as part of WorldCat.&lt;br /&gt;&lt;br /&gt;What the MSU letter tells me is that the OCLC economics are such that cataloging on OCLC is paying for other services, like record uploads, which may be under-priced. A different upload charge for non-cataloging libraries makes sense, and if that's the case then OCLC needs to make that clear. However, it wouldn't surprise me if that wouldn't make alternative cataloging services unmarketable, because as the MSU case shows, the total for cataloging elsewhere plus loading on OCLC would favor doing cataloging on OCLC. This makes perfect sense to me, but it appears that members haven't been informed of this pricing practice. Really, a little more transparency about pricing could go a long way toward avoiding situations like the MSU one.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-6193380842065945534?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/6193380842065945534/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=6193380842065945534' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/6193380842065945534'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/6193380842065945534'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/02/yet-more-oclc.html' title='Yet more OCLC'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-4997118612776064843</id><published>2010-02-25T05:58:00.000-08:00</published><updated>2010-03-19T09:16:44.984-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>OCLC again</title><content type='html'>Someone slipped me Larry Alford's &lt;a href="http://www.oclc.org/us/en/multimedia/2010/files/arc/Larry_Alford_essay.pdf"&gt;letter to OCLC&lt;/a&gt; members. This is the worst piece of "argument by innuendo" that I have ever seen. The members deserve better, much better.&lt;br /&gt;&lt;br /&gt;I am pretty much unable to discern the message in these four pages of insinuations and scores of questions. The document is entirely devoid of facts or information. Still, I'm going to attempt to extract some sense out of it.&lt;br /&gt;&lt;br /&gt;First, it's all about threats to WorldCat, in particular as libraries turn to other sources of bibliographic records. What these threats are should be easily quantifiable, but Alford doesn't provide us with any figures. Here's the information that is needed if one wants to make an assessment of the situation:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Are member libraries adding fewer records to WorldCat? How many fewer, and what is the actual loss of revenue to OCLC? Has anyone interviewed them to ask why?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Are former member libraries leaving WorldCat for other services? How many, and what is the actual loss of revenue to OCLC?&lt;/li&gt;&lt;li&gt;What does OCLC charge for its various services? There is no information on the web site, and I've heard it said that contracts between OCLC and libraries are confidential. This makes it very hard to have a discussion about costs and how costs are affecting OCLC's services in the market. Alford makes reference to "alternate service providers" (*cough* SkyRiver) but makes no comparison of costs or services.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;There are, of course, a number of red herrings in the text. I say "of course" because it is in the nature of this kind of emotional plea to bring up unsupported statements. As an example, he states that he has asked a series of questions, like&lt;br /&gt;&lt;blockquote&gt;Should the OCLC cooperative create and support software that provides quality control and the ability to make global changes as librarians create new subject headings and revise authority records?&lt;/blockquote&gt;&lt;br /&gt;and ends with&lt;br /&gt;&lt;blockquote&gt;I am pleased to note that the response of almost everyone to whom I have posed these questions has been a universal and enthusiastic "yes."&lt;/blockquote&gt;&lt;br /&gt;But let's look at those questions. He asks about "supporting" CONSER, NACO and BIBCO without saying the nature or cost of that support. Maybe there is something to think about there. He asks if OCLC should continue maintaining the Dewey classification. Well, what does it cost OCLC, and what revenue does it bring in? And would there be another venue for the community to maintain DDC if members decide that it's not a good activity for OCLC?&lt;br /&gt;&lt;br /&gt;He also asks, rhetorically, whether it is better to have a single database for bibliographic and holdings information or&lt;br /&gt;&lt;blockquote&gt;... is it preferable to sequentially search dozens or even hundreds of catalogs around the world to try to find that particular book or article that a researcher needs?&lt;/blockquote&gt;&lt;br /&gt;He should know that there are other options, but this document is not about facts but persuasion.&lt;br /&gt;&lt;br /&gt;Oftentimes I am unclear at what he is alluding to. On page three he says that there are libraries who are doing their cataloging elsewhere but "still want to participate in the resource sharing made possible by WorldCat." I don't know what resource sharing he means, but as far as I know anything beyond a search in the open WorldCat database is done for a fee. Is he complaining that some libraries do not contribute records to WorldCat but subscribe to other services? That sounds like a revenue stream to me. He refers to these libraries as consuming more value than they return, but I don't know what the unit of the "value" is. As a matter of fact, throughout the document there are references to value that sometimes seem to be about OCLC's revenue, and at other times seem to be about the completeness of WorldCat. Mixing these two up in the discussion is not helpful, not at all.&lt;br /&gt;&lt;br /&gt;The purpose of the mailing that this document was attached to was to let OCLC members know that a new, revised policy will soon be sent to OCLC's Council and Board of Trustees, and eventually to all members. If the policy was developed in the same kind of information vacuum that this document exhibits, I have little hope that it will be any better than the original policy that began this round of member dissatisfaction.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-4997118612776064843?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/4997118612776064843/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=4997118612776064843' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/4997118612776064843'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/4997118612776064843'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/02/oclc-again.html' title='OCLC again'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-852518523939273879</id><published>2010-02-22T10:28:00.001-08:00</published><updated>2010-02-22T10:41:32.473-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><title type='text'>Shameless Self-Promotion</title><content type='html'>The American Library Association has published two reports that I prepared on metadata and the semantic web.&lt;br /&gt;&lt;br /&gt;Report 1 is called: &lt;a href="http://www.alatechsource.org/library-technology-reports/understanding-the-semantic-web-bibliographic-data-and-metadata"&gt;Understanding the Semantic Web: Bibliographic Data and Metadata&lt;/a&gt;. This is a broad overview of new concepts, aimed especially at those who are new to the semantic web and to web-based metadata. (Note: To understand the diagrams, you will need a copy of the &lt;a href="http://kcoyle.net/LTR_errata.pdf"&gt;Errata&lt;/a&gt; page, since a key set of the diagrams was borked.)&lt;br /&gt;&lt;br /&gt;Report 2 has the catchy title of: &lt;a href="http://www.alatechsource.org/library-technology-reports/rda-vocabularies-for-a-twenty-first-century-data-environment"&gt;RDA Vocabularies for a Twenty-First-Century Data Environment&lt;/a&gt;. This builds on the first report, and gives more information about building semantic web vocabularies. This report is for all of you who are wondering what on earth it is that Diane Hillmann and I keep going on about when we talk about registering RDA for semantic web use. It is not overly technical so anyone who reads through Report 1 should be able to understand the general direction that we are advocating.&lt;br /&gt;&lt;br /&gt;Feel free to ask questions, make comments, argue with me, or tell me why I'm wrong. I don't claim to have the final answers, and want very much to have a dialog about these concepts that will lead us to a new interesting place to be in library technology.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-852518523939273879?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/852518523939273879/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=852518523939273879' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/852518523939273879'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/852518523939273879'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/02/shameless-self-promotion.html' title='Shameless Self-Promotion'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-9051953714441627672</id><published>2010-02-21T07:18:00.000-08:00</published><updated>2010-02-21T08:40:01.418-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><title type='text'>Trust and the Settlement</title><content type='html'>In the week leading up to the hearing (Feb. 18, 2010) in New York in Judge Chin's court on the proposed settlement between the AAP/AG and Google, many parties weighed in with formal documents as well as informal ones. While few if any of these produced new information for the judge, they do reveal the different points of view of the parties involved.&lt;br /&gt;&lt;br /&gt;One of these revelatory pieces is a &lt;a href="http://www.cdlib.org/cdlinfo/2010/02/16/hurtling-toward-the-finish-line-should-the-google-books-settlement-be-approved/"&gt;blog post&lt;/a&gt; by the University of California's Ivy Anderson. Anderson has been involved in the negotiations with Google probably from the very beginning of UC's involvement. Her post attempts to counter the criticism of the settlement as well as many fears that have been expressed, with an emphasis on academia and academic libraries. For example, Anderson cites checks and balances on pricing that should prevent price gouging, as well as the possibility for the participating libraries to negotiate prices with Google.&lt;br /&gt;&lt;br /&gt;I find two fundamental flaws in her arguments. The first is that she speaks from the perspective of a participating library, that is, a library that is able to negotiate directly with Google because of its position as a provider of books to be scanned. I have no doubt that this is a comfortable position for UC and for the other participating libraries, but they are small in number, especially compared to the total number of libraries and institutions that will be affected by the Google Book Search product. And of course their position is diametrically opposed to that of the general public, who have no voice in any of this project.&lt;br /&gt;&lt;br /&gt;It doesn't surprise me that Anderson and others in similar positions have positive feelings about the settlement: they have been able to negotiate with Google and to make their needs known. Undoubtedly they have received some concessions. I also have no doubt that Google has been gracious and helpful. For all of the rest of us, however, the entire process has been a black box. We are being asked to trust the participating libraries, and to trust their trust in Google. Even though the needs of the participating libraries, all of whom are large research libraries, are almost certainly not the same as our own.&lt;br /&gt;&lt;br /&gt;The second flaw that I see is Anderson's focus on Google as decision-maker. My reading of the composition of the governing body (should the settlement be approved) is that it will solely represent rights holders. It will set prices and even must approve Google's products. I find it interesting that we all (and Anderson included) tend to refer to this as the "Google settlement" -- but Google is the weak party in this particular situation. Remember that Google is the defendant, and that the mere act of settling is an admission of defeat. The libraries have hitched their wagon to the loser in this case. That can't be a good position.&lt;br /&gt;&lt;br /&gt;I must say that I am much more afraid, if that's the right word, of the power that could be wielded by the AAP/AG should the settlement be approved. Google has many kind words to say about libraries. The AAP, however, has made it clear that they consider many library uses of materials to be infringements:&lt;br /&gt;&lt;blockquote&gt;We also had significant concerns with respect to the digital copies that&lt;br /&gt;Google was providing to libraries. Libraries might use significant portions, or all, of the contents of books on such copies for a range of purposes that publishers would not regard as permitted by the Copyright Act, including uses in classroom, “e-reserve” access to&lt;br /&gt;students and faculty via institutional servers and lending digital copies to other libraries. Libraries might have raised fair use defenses in an attempt to justify such activities. We might also have been faced with sovereign immunity defenses by state institutions. In&lt;br /&gt;addition, we were concerned about how the libraries could maintain the security of these digital copies. Security breaches might result in broad copying, uploading, downloading, and display of copyrighted works. (&lt;a href="http://thepublicindex.org/docs/amended_settlement/richard_sarnoff_declaration.pdf"&gt;Statement of Richard Sarnoff&lt;/a&gt;, for the AAP board, p. 3)&lt;br /&gt;&lt;/blockquote&gt;The interesting upshot of this entire settlement process is that by digitizing the contents of libraries and managing those digital copies through contracts, the publishers could finally get the kind of control over library uses that they would have liked to have over the paper books held in libraries. They would like to have controls over inter-library loan, classroom use, and reserves, but they cannot exercise such controls in the analog world. Publishers have argued since the very &lt;a href="http://www.kcoyle.net/sfpltalk.html"&gt;early days&lt;/a&gt; of digital documents that all lending of digital documents is the making of a copy, and therefore is not allowed by copyright law. &lt;br /&gt;&lt;br /&gt;As a matter of fact, right on page one of the &lt;a href="http://thepublicindex.org/docs/amended_settlement/Plaintiffs_memorandum_of_law_in_support.pdf"&gt;Plaintiff's statement&lt;/a&gt; for the judge, among the bullet points describing the main achievements of the settlement, is this one:&lt;br /&gt;&lt;blockquote&gt;Limits library uses of digital copies of Rightsholders’ works.&lt;/blockquote&gt;Perhaps it has been naive of me to see this settlement as being about Google's commercialization of the world of books. It is possible that the more pertinent end result could be a renewed control of books and their uses by the publisher community. Attempts to modify copyright law to cover digital resources have failed, and the rights of the public in relation to those resources are as yet unclear. This has left a gap that the AAP/AG settlement exploits fully.&lt;br /&gt;&lt;br /&gt;OK, &lt;span style="font-weight: bold; font-style: italic;"&gt;now&lt;/span&gt; I'm afraid!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-9051953714441627672?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/9051953714441627672/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=9051953714441627672' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/9051953714441627672'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/9051953714441627672'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/02/trust-and-settlement.html' title='Trust and the Settlement'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-8524943157916388989</id><published>2010-02-05T06:11:00.000-08:00</published><updated>2010-02-05T06:44:42.155-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><title type='text'>DOJ: "A Bridge Too Far"</title><content type='html'>How long has it been since you read something that came from a government agency and thought: "Wow! Brilliant!" Kudos to the Department of Justice for their Statement of Interest in the AAP/AG v. Google suit. Summed up, in their words:&lt;br /&gt;In general, the project is a "good thing" -&lt;br /&gt;&lt;blockquote&gt; Breathing life into millions of works that are now effectively dormant, allowing users to search the text of millions of books at no cost, creating a rights registry, and enhancing the accessibility of such works for the disabled and others are all worthy objectives.&lt;/blockquote&gt;&lt;br /&gt;However, the settlement goes beyond the original dispute, and is trying to use class action to create a new market that is unrelated to the copyright-related lawsuit -&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;Although the United States believes the parties have approached this effort in good faith and the ASA is more circumscribed in its sweep than the original Proposed Settlement, the ASA suffers from the same core problem as the original agreement: it is an attempt to use the class action mechanism to implement  forward-looking business arrangements that go far beyond the dispute before the Court in this litigation. As a consequence, the ASA purports to grant legal rights that are difficult to square with the core principle of the Copyright Act that copyright owners generally control whether and how to exploit their works during the term of copyright. Those rights, in turn, confer significant and possibly anticompetitive advantages on a single entity – Google.&lt;/blockquote&gt;&lt;br /&gt;Not only that, but the DOJ seems to lend some weight to the "fair use" defense originally claimed by Google (and by the participating libraries) -&lt;br /&gt;&lt;blockquote&gt;      There has not been – and simply could not be – any allegation in this litigation that Google has sold full access to works for which it lacks the right to do so, or even that such activity was threatened. Indeed, selling such access would have been legally indefensible, and thus would have been at odds with Google’s entire pre-settlement book search strategy, which was premised upon staying within colorable “fair use” grounds. With very good reason, therefore, Google consciously avoided creating precisely the factual predicate that might support the settlement of book- and&lt;br /&gt;subscription-selling claims. The business models that the ASA authorizes therefore relate to activities in which Google never engaged or threatened to engage, and thus claims of copyright infringement that could not have been brought.&lt;/blockquote&gt;&lt;br /&gt;The anti-trust issues brought up by the suit are unchanged in this amended settlement agreement. This leaves the judge in an even tougher spot than he seemed to be in before: if he decides that the suit is a valid class-action then he has to address the anti-trust issues. However, I have seen no clear description anywhere of how those could be addressed, so the judge is being asked to be very clever indeed -&lt;br /&gt;&lt;blockquote&gt;Finally, the United States recognizes that if, as discussed supra, class representatives lack the power under Rule 23 to grant Google the power to exploit broadly the digital rights of class members to sell books, create subscription libraries, etc., then neither the class representatives nor Google possesses the power to authorize such activity by third parties. However, if the Court determines that the class representatives possess such rights as to Google, then the Court should carefully examine whether there exists a means for rival distributors to access orphan and rights-uncertain works consistent with Rule 23.&lt;/blockquote&gt;&lt;br /&gt;The DOJ suggests the following:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Some issues could be resolved by turning the "opt out" into "opt in" for rights holders. (That would essentially be exactly what we have today under copyright law.)&lt;/li&gt;&lt;li&gt;A "waiting period" before Google can make use of out-of-print works, to give rights holders a chance to surface. (This option seems to contradict #1)&lt;/li&gt;&lt;li&gt;More effort should go into finding rights holders.&lt;/li&gt;&lt;li&gt;A periodic reassessment of the marketplace for the out of print works (which, because of exposure, could have changed in market value)&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;The big question is: Is this the death knell for the settlement? And if so, where do we go next? I predict that if the suit is rejected we will have orphan works legislation sooner rather than later, since this suit has clearly high-lighted the need for such legislation. The copyright violation lawsuit against Google, however, remains. I fear that the settlement has poisoned the air for a fair use decision. We've seen the sausage being made, and it will be harder than ever to approach this project with an open and fair mind.&lt;br /&gt;&lt;br /&gt;What can be done? Well, in France, when faced with a take-over of their cultural heritage by Google (their words, not mine), the government responded by giving libraries a large sum so that they can do the digitizing themselves;  a kind of "by the people, for the people" digitization project. Is it too much to hope that could happen here?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-8524943157916388989?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/8524943157916388989/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=8524943157916388989' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8524943157916388989'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8524943157916388989'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/02/doj-bridge-too-far.html' title='DOJ: &quot;A Bridge Too Far&quot;'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-8207408932267543055</id><published>2010-01-20T06:51:00.000-08:00</published><updated>2010-01-21T12:09:09.420-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><title type='text'>Pardon my French</title><content type='html'>News is flying around the blog-o-sphere that the French have given up their fight against Google Books and have decided that they will embrace the Google Solution. This is simply not true, based on my reading of the "&lt;a href="http://www.culture.gouv.fr/mcc/content/download/3520/23115/file/Rapport%20sur%20la%20numerisation%20du%20patrimoine%20ecrit.pdf"&gt;Rapport Tessier&lt;/a&gt;". Admittedly, my French is mediocre at best, but here's my reading of what is actually said in this report (with some quoted bits in case I've misunderstood).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Flash: Thanks to Jean-Marc Destabeaux, there is a &lt;a href="http://kcoyle.net/tessier_exec.html"&gt;translation of the executive summary&lt;/a&gt; (pp 38-41). Note that I also edited that text, so any beefs or corrections should be sent to me. Ditto take-down notices&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Among other solutions (continuing to develop Gallica, and continuing to contribute to the European digital library, Europeana), the report suggests that a partnership with Google that respects the role of the French National Library (BNF) MAY be possible. No such deal has been struck with Google, as far as I can tell. The report is an in-depth summation of the need for a digitization program in France, as well as the problems.&lt;br /&gt;&lt;br /&gt;It is important to note that the role of the BNF is something that that does not exist in the US. The BNF is the official keeper and protector of the French cultural heritage, as far as intellectual products are concerned. The Library takes this role very seriously. France has a mandatory deposit law, and the BNF strives to have a copy of every book, journal, etc. ever published. France has a law that gives BNF the legal right to digitize its holdings for the purpose of preservation. It also has an orphan works law that specifically gives the BNF certain rights, although this report says little about the nature of that law.&lt;br /&gt;&lt;br /&gt;I read the report as an excellent analysis of the many ways in which Google Books is at odds with the mission of libraries.&lt;br /&gt;&lt;blockquote&gt;Une réponse inadaptée au regard des missions des bibliothèques&lt;/blockquote&gt;&lt;br /&gt;Preservation, equality of access, privacy... Google's program is lacking in all of these areas. The commercial goals of Google and the public goals of a national library are essentially at odds.&lt;br /&gt;&lt;br /&gt;To me, the main thrust of the report comes from this:&lt;br /&gt;&lt;blockquote&gt;Les bibliothèques françaises - en particulier la BnF - disposent d’atouts importants, qu’il ne faut pas sous-estimer dans le cadre d’une négociation avec un partenaire privé. (p. 17)&lt;/blockquote&gt;&lt;br /&gt;What this says is that libraries have a strong negotiating position with Google (that I think the authors of this report feel has not be fully exploited) because of the incredible value of their holdings. You could say that libraries have a monopoly of their own. Yet they seem to have traded their holdings for fairly meager returns in the contracts with Google.&lt;br /&gt;&lt;br /&gt;The report talks about the need to make library materials findable on the Web, primarily through search engines. (p. 62) Gallica and Europeana are typical databases, part of the dark web. Google books are keyword indexed in Google, but not findable through other search engines. BNF does not want its "patrimony" bolstering a single commercial search engine, and one source of access is not enough -- if the library is to be available, it must be available generally.&lt;br /&gt;&lt;blockquote&gt; S’il y a un rapport étroit entre Google et Google Livres, il reste que Google ne peut être considéré comme une modalité d’accès suffisante aux contenus de la bibliothèque numérique.&lt;/blockquote&gt;There is a section of the report on public/private partnerships. I believe that this is the source of the rumors, because it does suggest that a partnership with Google could be possible. That partnership, however, must respect that essential mission of the library.&lt;br /&gt;&lt;blockquote&gt;- proposer à la société Google une autre forme de partenariat, fondé sur l’échange équilibré de fichiers numérisés, sans clause d’exclusivité ;  (p. 38)&lt;/blockquote&gt;"... a different kind of partnership..." The "exclusivity" clause referred to relates to the contractual clauses that do not allow the participating libraries to share the files with third parties without Google's approval. Instead, the BNF proposes that such sharing must be allowed if libraries are to partner with Google so that they can make full use of the digital files to fulfill their mission.&lt;br /&gt;&lt;blockquote&gt;L’autre difficulté tient à des limitations explicites, qui peuvent brider les initiatives de la bibliothèque pour renforcer l’accessibilité à son patrimoine numérisé. Ainsi, la bibliothèque ne peut partager ou fournir le contenu numérisé à une tierce partie sans avoir obtenu préalablement l’autorisation de Google...   (p. 16)&lt;/blockquote&gt;I could go on and on, but I really hope that someone will translate this report (or at least the executive summary) because it is an excellent summarization of vast difference in goals between Google and libraries. It discusses the downsides of mass digitization, and proposes a mixed solution between mass digitization and curated digital collections.&lt;br /&gt;&lt;blockquote&gt;La numérisation de masse est une voie possible, mais non exclusive, de la numérisation ; ses contraintes et ses limites, tout comme ses indéniables apports, doivent être aujourd’hui intégrés à toute réflexion poussée sur ce qui constitue les missions historiques des bibliothèques patrimoniales, tant en termes de conservation que de valorisation. C’est une des conditions nécessaires pour ne pas perdre le fil du débat.&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-8207408932267543055?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/8207408932267543055/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=8207408932267543055' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8207408932267543055'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8207408932267543055'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2010/01/pardon-my-french.html' title='Pardon my French'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-5223737791439871638</id><published>2009-12-04T11:35:00.000-08:00</published><updated>2009-12-15T09:23:05.859-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='women technology'/><title type='text'>Girls are still icky</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_Z1EA7hov2P0/SxlmixK0FrI/AAAAAAAAA0o/VBLvfUZJAAU/s1600-h/Scan0004.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 300px; height: 330px;" src="http://3.bp.blogspot.com/_Z1EA7hov2P0/SxlmixK0FrI/AAAAAAAAA0o/VBLvfUZJAAU/s400/Scan0004.jpg" alt="" id="BLOGGER_PHOTO_ID_5411469174762641074" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;In 1996 I wrote a piece that was published in a book called Wired Women about the gender gap in computer advertising. (&lt;a href="http://kcoyle.net/howhard.html"&gt;Online version&lt;/a&gt;) At the time, the computer world was so awash in testosterone that the back pages of computer user magazines, like PC Magazine, were mainly adds for pornography. The message for women was clear: No Girls Allowed. Like this postcard ad for a mid-1990's bulletin board system, which said on the back:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;No wasted time.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;No garbage.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;No noise.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;No irrelevant clutter.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;For serious computer programmers and developers, BIX is the exclusive online club.&lt;/span&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;And one could add: no girls allowed.&lt;br /&gt;&lt;br /&gt;I thought all of that was in the past, but ... nooooooooo. This Verizon Droid ad is just an updated version of that 1996 BBS advertisement.&lt;br /&gt;&lt;br /&gt;&lt;object height="320" width="265"&gt;&lt;param name="movie" value="http://www.youtube.com/v/sLDxv9ohH2s&amp;amp;hl=en_US&amp;amp;fs=1&amp;amp;"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;embed src="http://www.youtube.com/v/sLDxv9ohH2s&amp;amp;hl=en_US&amp;amp;fs=1&amp;amp;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="320" width="265"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;Not only should the phone not be pretty (so maybe Ugly Betty would be acceptable), but it is not a "tiara-wearing digitally clueless beauty pageant queen" and, as the women puts on her lipstick in the street, "it's not a princess, it's a robot." I don't want to go so far as to assume that the attack on the fashion-plate male manikins equals homophobia, but the ending salvo of "It trades hair-do for can-do" is eerily reminiscent of the image above.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-5223737791439871638?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/5223737791439871638/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=5223737791439871638' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5223737791439871638'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5223737791439871638'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2009/12/girls-are-still-icky.html' title='Girls are still icky'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_Z1EA7hov2P0/SxlmixK0FrI/AAAAAAAAA0o/VBLvfUZJAAU/s72-c/Scan0004.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-317710346619166185</id><published>2009-11-23T10:27:00.000-08:00</published><updated>2009-11-23T10:45:18.960-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><category scheme='http://www.blogger.com/atom/ns#' term='FRBR'/><title type='text'>1923</title><content type='html'>The Google Books Settlement is causing a great deal of previously unexpressed bibliographic interest -- just how many books are there in the known universe? How many are published in the four countries now included in the settlement agreement? (US, UK, Canada, Australia). And how many are in the public domain?&lt;br /&gt;&lt;br /&gt;Lorcan Dempsey and Brian Lavoie have recently published an &lt;a href="http://dlib.org/dlib/november09/lavoie/11lavoie.html"&gt;article in DLib&lt;/a&gt; that looks at these figures using the world's largest database of bibliographic data, WorldCat. The data is fascinating, but I have already seen it mis-interpreted, so I thought some clarification might be useful.&lt;br /&gt;&lt;br /&gt;Dempsey and Lavoie are very clear that what they are measuring is "Manifestations." Folks outside of the library environment are unlikely to know what that means, therefore it is important to clarify what the numbers in the Dempsey/Lavoie article represent. Each “book” that is counted represents a published product at about the same level of granularity that today would be given an ISBN. Therefore if a publisher re-issues a book in their backlist after the previous print run has been exhausted (say, a decade later) and with a new introduction, it is considered a different book. The publication date that is fed into the study is the date of the new issuing of the book. Also, as publishers re-package and re-print public domain books, these also are considered separate products with new ISBNs and new dates.&lt;br /&gt;&lt;br /&gt;Thus, if you look up a commonly re-published book like “Moby Dick, Or The Whale” in the Library of Congress catalog, you retrieve 40 items (and more if you use the short form of the name, simply “Moby Dick”), of which only one is pre-1923 — that one was published in 1851. Of the other thirty-nine instances of the publication of the work, which range from 1925 to 2006, some contain what GBS called “inserts” - that is, separately copyrightable intellectual property in the form of introductions, etc., but others may be a straight republication of the text. If you do the same lookup in FictionFinder, a work-based view of a portion of the WorldCat database. you find:&lt;br /&gt;&lt;br /&gt;   823 editions of "Moby Dick" (which combines the various versions of the title)&lt;br /&gt;   534 of which are in English&lt;br /&gt;&lt;br /&gt;of these:&lt;br /&gt;&lt;br /&gt;   9 have an unknown date&lt;br /&gt;  60 have a date of 1923 or earlier&lt;br /&gt;  465 have dates after 1923&lt;br /&gt;&lt;br /&gt;Looking through the list on FictionFinder it is easy to see that there are some duplicate records, both in the pre- and post-1923 entries.&lt;br /&gt;&lt;br /&gt;Therefore, the question we now need to answer is: how many public domain works have been republished after the 1923 cut-off date?&lt;br /&gt;&lt;div class="content"&gt;  &lt;p&gt;Google appears to currently lack the ability to make the proper connection between the original text that is in the public domain and the many “manifestations” (as they are called in library-speak) that were published later — and are also in the public domain, at least as far as the primary text is concerned. This is a non-trivial exercise when one is working only with the metadata that describes the work, but may become more feasible with the ability to do a full text analysis of the contents of the various packages in which publishers have placed the original work of Melville. I assume that Google is working on this, although I cannot predict how it will affect their assessment of the PD/(c) split.&lt;/p&gt;       &lt;/div&gt;What is clear, however, is that Google is going to need to identify Works (if not strictly in the FRBR sense, then at least in the sense that meets some definition that is valid for copyright law).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-317710346619166185?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/317710346619166185/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=317710346619166185' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/317710346619166185'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/317710346619166185'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2009/11/1923.html' title='1923'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-3161425874722931059</id><published>2009-11-14T08:31:00.000-08:00</published><updated>2009-11-14T15:16:56.030-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><title type='text'>Amended Google/AAP Settlement</title><content type='html'>The amended settlement  has been issued (the best way to see the changes is in the &lt;a href="http://thepublicindex.org/docs/amended_settlement/amended_settlement_redline.pdf"&gt;redline&lt;/a&gt; version). I will summarize here the changes that I see as having the greatest impact on libraries and on the public. For legal issues, I suggest &lt;a href="http://laboratorium.net/"&gt;James Grimmelmann's blog&lt;/a&gt;. For business issues, probably the NY Times and Wall Street Journal.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Foreign Works Mostly Excluded&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Undoubtedly due to the many complaints from foreign rights holders, the settlement now only includes (oddly enough) US, UK, Australian and Canadian works. This would include, as I interpret it, translations of non-USetc works published in those four countries. This greatly changes the value of the institutional subscription for higher education, as well as the value of the 'research corpus' (essentially a database of the OCR'd texts that researchers can use for computational research).&lt;br /&gt;&lt;br /&gt;Since we know that information seekers prefer accessing works online rather than in hard copy,  I anticipate that the online service will be very popular. But it will contain almost exclusively these Anglo-American products, a narrow swath of the intellectual output of the planet. As it is, too many Americans are unaware of the world outside of those Anglo-American borders. This will just exacerbate that problem. It could change the content of of education and research. As I've said &lt;a href="http://kcoyle.blogspot.com/2007/03/unintended-consequences.html"&gt;before&lt;/a&gt;, availability is a significant determinant of what intellectual materials people use in their research.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Particular to Libraries&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In general, the sections on libraries (both participation and use of the digital copies) remain unchanged. There are a few minor changes, some of which are puzzling.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Public Libraries&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The statement about the free access for public libraries has been changed from:&lt;br /&gt;&lt;blockquote&gt; in the case of each Public Library, &lt;span style="color: rgb(255, 0, 0);"&gt;no more than &lt;/span&gt;one terminal per Library Building&lt;/blockquote&gt;&lt;br /&gt;to&lt;br /&gt;&lt;blockquote&gt;in the case of each Public Library, one terminal per Library Building.; &lt;span style="color: rgb(51, 51, 255);"&gt;provided, however, that the Registry may authorize one or more additional terminals in any Library Building under such further conditions at it may establish, acting in its sole discretion and in furtherance of the interests of all Rightsholders.&lt;/span&gt;&lt;/blockquote&gt;So it leaves the options open for giving some public libraries additional (free?) access. Still, there is no information on whether or how public libraries could subscribe in a way that would allow them to fully serve their communities.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;Microforms&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The definition of "books" that could be digitized originally included microforms. The word "not" has been added:&lt;br /&gt;&lt;blockquote&gt;  hard copy (&lt;span style="color: rgb(51, 51, 255);"&gt;not&lt;/span&gt; including microform)&lt;br /&gt;&lt;/blockquote&gt;No idea why, but perhaps a look at the comments will reveal one from UMI or some other party related to microforms.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-style: italic;"&gt;[Found it: The &lt;/span&gt;&lt;a style="font-style: italic;" href="http://thepublicindex.org/docs/objections/proquest.pdf"&gt;ProQuest letter&lt;/a&gt;&lt;span style="font-style: italic;"&gt; states that dissertations should NOT be included as they are controlled through ProQuest's dissertation service. The letter mentions that some dissertations are in microform format, but that today many are available as print-on-demand or online. Although microforms were excluded, p. 327 of the redline document states:&lt;br /&gt;"&lt;/span&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;What Material Is Covered&lt;/span&gt;&lt;span style="font-style: italic;"&gt;?&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;"Books” include in-copyright written works, such as novels, textbooks, dissertations, and other writings...".&lt;/span&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;So ProQuest did not get what it asked for.]&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-weight: bold;"&gt;OCLC Networks&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The original settlement had a strange exception that removed OCLC networks from the definition of "consortium":&lt;br /&gt;&lt;blockquote&gt;"Institutional Consortium” means a group of libraries, companies, institutions or other entities located within the United States that is a member of the International Coalition of Library Consortia&lt;span style="color: rgb(255, 0, 0);"&gt;&lt;span style="font-size:100%;"&gt; with the exception of Online Computer&lt;/span&gt; Library Center (OCLC) - affiliated networks.&lt;/span&gt;&lt;/blockquote&gt;That exception has been removed. I would love to know why it was there in the first place, but can only assume that one or both of these requests came about because of participation by OCLC in the settlement discussions.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-style: italic;"&gt;[Note: I discovered that Lyrasis and Nylink filed an &lt;/span&gt;&lt;a style="font-style: italic;" href="http://thepublicindex.org/docs/objections/lyrasis.pdf"&gt;objection&lt;/a&gt;&lt;span style="font-style: italic;"&gt; about this exception, which may be why it was removed. Their analysis was that it had come from OCLC and gave OCLC the ability to manage competition by determining which organizations would be excluded from participating in the business of brokering services for libraries.&lt;/span&gt; &lt;span style="font-style: italic;"&gt;They assume that OCLC hopes to be in that business itself.]&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:100%;" &gt;Download Formats &amp;amp; Course Packs&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In the original settlement, the only download format mentioned was PDF. As we know, since then Google has announced that it will provide e-books from the publisher partner content that it carries on GBS. Ebook formats have been added in to the settlement as possible download formats. At the same time, the product line described as:&lt;br /&gt;&lt;blockquote style="color: rgb(255, 0, 0);"&gt;Custom Publishing - Per-page pricing of Books, or&lt;br /&gt;portions thereof, for course materials, and other forms of custom&lt;br /&gt;publishing for the educational and professional markets&lt;/blockquote&gt;has been removed.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Other?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There are complex changes to the treatment of orphan works which I have not tried (yet) to absorb. Those will undoubtedly have some impact on libraries and the public but at the moment I have no thoughts on that.&lt;br /&gt;&lt;br /&gt;The settlement now allows rightsholders to place a Creative Commons license on their works. I really don't see a great deal of significance in this, although it does emphasize that by participating in GBS your rights are now governed by contract law rather than copyright law.&lt;br /&gt;&lt;br /&gt;And, last, Google admits to some of its own difficulties in bibliographic control when it states that "The inclusion of a work within the Books Database does not, in and of itself, mean that the&lt;br /&gt;work is a Book within the meaning of Section 1.19 (Book)." In other words: we threw a whole bunch of bib records into a database; don't assume anything from it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-3161425874722931059?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/3161425874722931059/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=3161425874722931059' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3161425874722931059'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/3161425874722931059'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2009/11/amended-googleaap-settlement.html' title='Amended Google/AAP Settlement'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-5373799849502665417</id><published>2009-11-09T13:42:00.000-08:00</published><updated>2009-11-09T17:39:25.935-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><title type='text'>Googled</title><content type='html'>Waiting for the next round of Google/AAP/AG settlement prose (which was due today, November 9, but has been moved back to Friday, November 13, when the parties will presumably present it to the judge), I have read Ken Auletta's book "&lt;a href="http://openlibrary.org/b/OL23606345M/Googled"&gt;Googled: the end of the world as we know it&lt;/a&gt;." It's mainly a business book, and primarily about media and advertising. I can sum up what it says about Google in three statements:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Engineering can fix anything&lt;/li&gt;&lt;li&gt;Information is neutral and measurable&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Advertising is information&lt;/li&gt;&lt;/ol&gt;OK, maybe that's a bit overly concise, but that is what it boils down to. I've often wondered how your motto can be "Don't be evil" when you are in the advertising business. It obviously works if you consider information to only have meaning based on numerical measures, and that advertising is just another kind of information. This engineer-based mentality as the guiding principle of the largest, richest advertising company in the world falls somewhere between Ayn Rand's objectivism and Bernie Maddoff's ponzi scheme. About 50% of Google's employees are engineers, and engineers, on average, earn twice what non-engineers earn.&lt;br /&gt;&lt;br /&gt;Google has ramped up the advertising game by orders of magnitude, destabilizing huge, long-lived media companies, and it's all based on... winners win. Google sees its role as matching up users with things they are seeking, whether it's web sites, books, or a place to buy sneakers. It doesn't matter to Google what the information is.&lt;br /&gt;&lt;br /&gt;There is something creepy about the way that Auletta refers to SergeyandLarry as "the founders." It sounds almost... cult-like. The fact that the book treats the founders and CEO Eric Schmidt as a three-some is just way too &lt;a href="http://en.wikipedia.org/wiki/Trinitarian"&gt;trinitarian&lt;/a&gt; for my taste.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-5373799849502665417?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/5373799849502665417/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=5373799849502665417' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5373799849502665417'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5373799849502665417'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2009/11/googled.html' title='Googled'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-7302184344427592378</id><published>2009-10-23T16:47:00.000-07:00</published><updated>2009-10-23T17:58:34.488-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><title type='text'>Objecting to GBS/AAP/AG Settlement</title><content type='html'>ALA Washington Office has posted an &lt;a href="http://wo.ala.org/gbs/the-google-books-settlement-who-is-filing-and-what-are-they-saying/"&gt;analysis &lt;/a&gt;of who filed comments/briefs to the court relating to the Google/AAP/AG settlement.  Of the "class member objectors," e.g. authors and publishers, 82 US parties filed objections. Astonishingly, there were 295 objections filed by foreign "class members," including the publisher organizations in a number of countries. The objections range from the seemingly trivial (the poor quality of the translations of the notice that were provided) to concrete descriptions of how the settlement violates the rights of rights holders under the &lt;a href="http://www.wipo.int/treaties/en/ip/berne/trtdocs_wo001.html"&gt;Berne Convention&lt;/a&gt;. I'll sum up some of these objections:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;The class&lt;/span&gt; -- members of the class were not given sufficient notice, nor were they able to read the actual settlement documents, which were not provided in translation.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Moral rights&lt;/span&gt; -- Berne includes moral rights, that is the right of the author to control the use of ones' work. This is interpreted quite liberally in some countries, to include things like cover images used in sales, metadata, etc. While these may seem unimportant, the Italian publishers' organization AIE was horrified to find one of its newsletters listed with an author of "Fascist Federation of Publishers". This was a previous name of the organization, but was found offensive to the organization.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Registration requirements&lt;/span&gt; -- Berne states clearly that "... exercise of these rights shall not be subject to any formality..." It was this aspect of Berne that ended the copyright registration requirement in the US. Objectors claim that the need to register with the Books Rights Registry violates this aspect of Berne. The logic being that you are the copyright holder regardless of any action you take to assert that.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Definition of "out of print"&lt;/span&gt; -- This is probably being revised by the main parties, but the original settlement document stated that "Google will use the publishing status, product availability and/or availability codes to determine whether or not the particular database being used considers that Book to be offered for sale new through one or more then-customary channels of trade in the United States." Various objectors were able to show that Google's determination (as available in the database managed by Google today) was wrong in a majority of cases.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Definition of "in print"&lt;/span&gt; -- This one also might be undergoing revision. The settlement defines "in print" as "be offered for sale new." Some objectors pointed out that there are books that are free, that are online for open access, etc. The argument is that these cannot be considered out of print.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Representation&lt;/span&gt; -- None of the foreign class members consider either the AAP nor the AG to represent them. Some ask that there at least be foreign class members on the board of the Rights Registry. Others simply consider the class membership to be invalid.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;The burden on publishers&lt;/span&gt; -- The burden of identification has been placed on publishers. For a publisher with an active list of titles, this could be a considerable amount of work. Google offered that if publishers would provide ONIX metadata, they would do an automated matching against the database. Apparently this has failed to provide relief, most likely because of differences between the publishers' metadata and that of Google.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;The effect of secrecy&lt;/span&gt; -- Because Google works heavily in "trade secret mode," it is very difficult for the rights holders to find and diagnose problems relating to their works. Yet the settlement does not hold Google accountable for errors in the data.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Privacy &lt;/span&gt;-- the EU has rather strict privacy rules. This argument is a bit contorted because at the moment there is no plan to allow EU users to access the books covered by the settlement, since the settlement is only valid in the US. But at least one objector acknowledged that users would gain access by going through US proxy servers. It isn't clear to me if one can apply local law when masquerading as someone else through a proxy.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Local digitization laws&lt;/span&gt; -- At least one country, Germany, has made provisions for library digitization of works (and in-library access) which requires that the library obtain permission from the rights holder.  This objection is a bit indirect, but it seems to be one of indignation that Google could be digitizing works that the national library of the country where the work was published cannot.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Censorship &lt;/span&gt;-- Many are concerned that Google may eliminate books from its service "for editorial reasons" without having to justify itself. This is an interesting and difficult argument -- it's like saying you're against the service, and you're afraid it won't have everything. It makes sense, however, because if Google becomes the predominant access to books in the US and it could censor without recourse, that a single company gets a great deal of control over both information and culture. There should be more objection to this from within the US.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;General moral and cultural indignation&lt;/span&gt; -- I read about a dozen of the foreign objections. In some cases, I may have been reading into the text an undertone of moral and cultural indignation. Not in the case of Germany and France, however, who were quite clear on their objection to the monetization of their cultural heritage. Here are some quotes:&lt;/li&gt;&lt;/ol&gt;&lt;blockquote&gt;"... the proposed settlement homogenizes (or "Googlizes") and demeans those special elements that distinguish the unique cultural tradition of France by turning books into a merely industrial by-product of a computer database."&lt;/blockquote&gt;&lt;blockquote&gt;&lt;br /&gt;"France's concern for its authors is only heightened by the proposed settlement's shroud of secrecy and hint of an uncontrolled, autocratic concentration of power in a single corporate entity, Google, that generates more revenue than many countries."&lt;br /&gt;&lt;/blockquote&gt;&lt;blockquote&gt;"The Federal Republic of Germany is historically called "Das Land der Dichter und Denker" (the land of poets and thinkers). ... Germany can rightfully claim the mantle of birthplace of modern printing and publishing. ... [the settlement] will flout German laws that have been established to protect German authors and publishers... creating a new worldwide copyright regime without any input from those who will be greatly impacted -- German authors, publishers and digital libraries and German citizens who seek to obtain access to digital publications through the Google service. "&lt;br /&gt;&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-7302184344427592378?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/7302184344427592378/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=7302184344427592378' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7302184344427592378'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7302184344427592378'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2009/10/objecting-to-gbsaapag-settlement.html' title='Objecting to GBS/AAP/AG Settlement'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-7595963777910518837</id><published>2009-10-14T10:32:00.000-07:00</published><updated>2009-10-14T11:38:11.881-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cataloging'/><category scheme='http://www.blogger.com/atom/ns#' term='oclc'/><title type='text'>OCLC and "Competition"</title><content type='html'>The announcement of a new company, &lt;a href="http://theskyriver.com/"&gt;SkyRiver&lt;/a&gt;, providing cataloging services to libraries has sparked a number of comments about competing with OCLC and WorldCat. For a number of reasons, I don't think that the result of such a service is necessarily competitive, although I am very glad to see alternatives enter the marketplace, especially for those who do not use OCLC.&lt;br /&gt;&lt;br /&gt;To begin with, OCLC is more than an online cataloging service. Admittedly, revenue from cataloging is OCLC's largest income source, so cataloging is not in any way just an incidental function from OCLC's point of view, but cataloging alone is not the point or purpose of OCLC to its users. I see OCLC as a kind of social network where the "beings" are libraries. The value of OCLC is directly related to the population it encompasses, and the social services it can provide based on that population. Shared cataloging copy is one service, but discovery and delivery options probably motivate OCLC members as much or even more than the cataloging effort. This was evident when RLG still existed, as some RLG member libraries who did their cataloging in RLIN also loaded their records into WorldCat in order to participate in the services that OCLC provided.&lt;br /&gt;&lt;br /&gt;The value of the catalog copy on OCLC may be second to the value of the holdings information that OCLC maintains. Catalog copy, if that's all you want, can be found in innumerable library catalogs (including the Library of Congress), and some library systems allow you to export or retrieve a full MARC record that you can then add to your own catalog. Catalog copy can also often be found on the retro of the title page in the form of Cataloging in Publication (CIP), although not in MARC format and not as a complete record. But no one else, and no other service, has the combined holdings of some 60,000 libraries, and that's the main thing that OCLC brings to the table. It is only because of these holdings that WorldCat has value to individual searchers and to the libraries who serve them.&lt;br /&gt;&lt;br /&gt;The view of OCLC as "the only game in town" for library cataloging ignores the fact that there are libraries who do not participate in OCLC, for a variety of reasons, but who still need to create bibliographic records. These libraries may not be able to afford OCLC's prices for cataloging services, or they may simply not wish to be bound by the standards of that society of libraries. Some libraries, in particular those in corporate settings, are not able to share their holdings publicly, and therefore are not able to participate in the social life of libraries that WorldCat represents.&lt;br /&gt;&lt;br /&gt;There are also non-library providers of library catalog records, in particular the vendors who include catalog data with the products they sell to libraries. These vendors need a source of cataloging copy that is unrelated to particular holdings information.&lt;br /&gt;&lt;br /&gt;If we can think further down the line, a database of bibliographic records, like that in &lt;a href="http://theskyriver.com/"&gt;SkyRiver&lt;/a&gt; or &lt;a href="http://biblios.net/"&gt;biblios.net&lt;/a&gt; could become a resource for anyone who needs to work with bibliographic data. This could include anyone on a research project who wants to provide a quality bibliography with a minimum of effort. Although the bibliography will follow citation standards, the basic data is the same as that found in library records.&lt;br /&gt;&lt;br /&gt;Another advantage that these and other bibliographic services may provide to us all in the library profession is that they could be a source of data for experimentation. What with RDA looming on the horizon and much talk about updating our data format from MARC to something else, we'll need data to work with. OCLC has historically been slow to change its data, and not without reason: OCLC is integrated into the workflows of  tens of thousands of libraries that depend on it for every day functionality. Although the OCLC research division comes up with innovative ideas, the OCLC core functionality is essentially the same as it was two or three decades ago. If we want to experiment with radical change, I for one expect it to come from the sidelines, not the center.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-7595963777910518837?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/7595963777910518837/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=7595963777910518837' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7595963777910518837'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7595963777910518837'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2009/10/oclc-and-competition.html' title='OCLC and &quot;Competition&quot;'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-8715239859311152277</id><published>2009-09-20T10:33:00.000-07:00</published><updated>2009-09-20T22:48:22.095-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><title type='text'>DOJ drops bomb in Google/AAP settlement</title><content type='html'>On Friday, September 17, 2009 the Department of Justice delivered its long-awaited &lt;a href="http://thepublicindex.org/docs/letters/usa.pdf"&gt;Statement of Interest&lt;/a&gt; in the proposed settlement between Google and the AAP/AG in the class action suit surrounding the Google Book Search product. The DOJ has some very specific requirements for modification of the settlement, some of which could result in significant changes in the nature of the agreement. The headline, however, is:&lt;br /&gt;&lt;blockquote&gt;that &lt;span style="font-weight: bold;"&gt;"the court should reject the settlement in its current form,"&lt;/span&gt; and reconsider after changes are made.&lt;/blockquote&gt;&lt;br /&gt;Beyond that, my summary is this:&lt;br /&gt;&lt;br /&gt;1) the DOJ does not like that the settlement allows uses of orphan works that go beyond those allowed by copyright law, and especially that others will be profiting from those uses&lt;br /&gt;&lt;br /&gt;2) the DOJ considers the settlement to be anti-competitive, and&lt;br /&gt;&lt;br /&gt;3) between the lines, it appears that the DOJ can't decide between supporting the full access to scanned books for the good of mankind, and wanting the settlement to limit itself to the original scope of Google's project, which was to digitize for indexing only.&lt;br /&gt;&lt;br /&gt;And I should add:&lt;br /&gt;&lt;br /&gt;4) nothing here has a direct effect on libraries or the Google library partners, except, perhaps, in that it changes the product that Google will provide as its subscription service, and&lt;br /&gt;&lt;br /&gt;5) that the DOJ letter clearly states that Google and the AAP/AG are already in the process of making changes to the settlement to respond to the DOJ's concerns.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;span style="font-size:180%;"&gt;The Concerns&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;span style="font-size:130%;"&gt;&lt;br /&gt;The Class&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The first has to do with the definition of the class of rights holders who are party to the class action suit. DOJ concludes that the settlement does not satisfy the rules for defining a class as set out in &lt;a href="http://www.law.cornell.edu/rules/frcp/Rule23.htm"&gt;Rule 23&lt;/a&gt;, the rule that governs class action suits.&lt;br /&gt;&lt;br /&gt;In this area, DOJ is mainly concerned with the potential rights holders of orphan works. It isn't easy to understand what solutions DOJ sees for finding the rights holders for these works, but the Department is uneasy that known rights holders will be the ones negotiating with the rights registry, and that they will also benefit from any money made on orphan works. In other words, it will be to the advantage of rights holders that the parents of those orphans NOT be found. DOJ suggests, among other things, that the money made on orphan works not be paid out to others, but be used to try to find rights holders.&lt;br /&gt;&lt;br /&gt;It also suggests that not enough work was done to notify all potential members of the class, in particular foreign authors.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;The Potential Uses, and Orphan and Out-of-Print Works&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;DOJ appears to be nervous about the open-endedness of the future uses that Google can make of both orphan and out-of-print works. To remedy this, it is suggested that out-of-print works (including orphans) be treated the same as in-print works, that is, that rights holders must opt-in to any uses that Google intends to make of the works. To me this makes sense from a legal point of view, since copyright does not distinguish between in- and out-of-print status. It makes less sense from a market point of view, because presumably there is less active interest in the out-of-print works on the part of the rights holder. However, we really do not know what in- and out-of-print mean in a predominantly digital environment, and it may be a mistake to be making decisions based on the analog market, as the settlement does.&lt;br /&gt;&lt;br /&gt;There are some parts of the DOJ document that suggest what could be radical solutions, yet they appear almost as asides, such as when suggesting that out-of-print works should be subject to opt-in, they say:&lt;br /&gt;&lt;blockquote&gt;"Such a revision would, of course, not give Google immediate authorization to use all out-of-print works beyond the digitization and scanning which is the foundation of the plaintiffs' Complaint in this matter." p. 14&lt;br /&gt;&lt;/blockquote&gt;This seems to indicate that DOJ would be more comfortable with a settlement that essentially authorized the current scope of the Google Book Search product, which was the basis for Google's claim of Fair Use: search and snippet display.&lt;br /&gt;&lt;br /&gt;In another section, they voice concern over the fact that some rights holders will be earning money on the unclaimed works of others. They say:&lt;br /&gt;&lt;blockquote&gt;"The risk of such improper leveraging might also be reduced by narrowing the scope of the license. A settlement that simply authorized Google to engage in scanning and snippet displays in the future would limit the profits that others could potentially derive from out-of-print works whose owners fail to learn of their right to claim those profits." p. 15&lt;/blockquote&gt;In fact, this would greatly limit the profit that Google could earn (from which those of the rights holders derive), since the main source of expected profit for Google seems to be from the licensing of full views of the books (to libraries and other institutions) and the "sale" of books to individuals. If this is really what the DOJ means, then it is essentially suggesting that Google have no more use of orphaned works than it has today. With that limitation, it seems that Google might as well go forward with its Fair Use defense, if it would want to continue scanning books at all.&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;br /&gt;Competition&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;DOJ is concerned that the settlement doesn't allow for sufficient competition. It isn't clear to me, however, how that competition might be achieved. First the document states that the Registry does not have the power to give access to works to entities other than Google, since copyright law doesn't allow it. Then it says that the best solution is to make sure that other companies get equal access. To show that I'm not making this up (although I may be mis-interpreting):&lt;br /&gt;&lt;blockquote&gt;"The Proposed Settlement does not forbid the Registry from licensing these works to others. But the Registry can only act "to the extent permitted by law." S.A. 6.2(b). And the parties have represented to the United States that they believe the Registry would lack the power and ability to license copyrighted books without the consent of the copyright owner -- which consent cannot be obtained from the owners of orphan works." p. 23&lt;/blockquote&gt;&lt;blockquote&gt;"This risk of market foreclosure would be substantially ameliorated if the Proposed Settlement could be amended to provide some mechanism by which Google's competitors could gain comparable access to orphan works...." p. 25&lt;/blockquote&gt;As far as antitrust goes, the document states that although there are concerns about antitrust, the full analysis has not been completed. There are suggestions, however, that the main concerns have to do with the Book Rights Registry and the setting of prices for all works (instead of relying on competition to determine prices).&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;-------------&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;All in all, it seems to me that the DOJ has pointed out some of the same problems indicated by others, but unfortunately hasn't really given a clear direction for the settlement to take. What we do know is that we'll see a new version of the settlement sometime in the future... many more pages of dense text to ponder.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-8715239859311152277?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/8715239859311152277/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=8715239859311152277' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8715239859311152277'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8715239859311152277'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2009/09/doj-drops-bomb-in-googleaap-settlement.html' title='DOJ drops bomb in Google/AAP settlement'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-1978269480525436891</id><published>2009-09-14T03:33:00.000-07:00</published><updated>2009-09-14T03:43:17.360-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><category scheme='http://www.blogger.com/atom/ns#' term='metadata'/><title type='text'>Google Books Metadata and Library Functions</title><content type='html'>In a recent post in the NGC4LIB list, we got a very welcome answer from Chip Nilges of OCLC about Google's use of WorldCat records:&lt;br /&gt;&lt;blockquote&gt;To answer Karen's most recent post, Google can use any WC metadata field.  And it's important to note as well that our agreement with Google is not exclusive.  We're happy to work with others in the same way.  The goal, as I said in my original post, is to support the efforts of our members to bring their collections online, make them discoverable, and drive traffic to library services.&lt;br /&gt;&lt;br /&gt;Regards,&lt;br /&gt;&lt;br /&gt;Chip&lt;/blockquote&gt;&lt;br /&gt;As we have seen from recent postings about the metadata being presented in the Google Books Search service, there are some problems. Although Google claims to have taken the metadata from its library partners, we can look at records in GBS and the record for that item in the library partner database and see &lt;a href="https://listserv.nd.edu/cgi-bin/wa?A2=ind0909&amp;amp;L=NGC4LIB&amp;amp;T=0&amp;amp;F=&amp;amp;S=&amp;amp;P=7018"&gt;how very different they are&lt;/a&gt;. It is clear that Google has not retained all of the fields that libraries have provided, and has made some very odd choices about what to keep. Perhaps what we need to do, to help Google improve the metadata, is to make clear what data elements we anticipate we will need in order to integrate the Google product with library services.&lt;br /&gt;&lt;br /&gt;When you ask people what metadata is needed for a service, they will often reply something like "everything" or "more is better." I'm going to take a different approach here because I think it is a good idea to connect metadata needs with actual functionality. This not only justifies the metadata, but the functionality helps explain the nature of the metadata that is required. For example, if we say that we want "date of publication" in our metadata, it may seem that we could use the date from the publication statement, which can have dates like "c1956" or "[1924]." If, instead, we indicate that we want to use dates in computational research, then it is clear (hopefully) that we need the fixed field date (from the 008 field in the MARC record).&lt;br /&gt;&lt;br /&gt;So here are the functions that come to my mind, and I welcome additions. (Do remember that at this point we are only talking about books, so many fields relating to other formats will not be included.) I'll add the related MARC fields as I get a chance.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Function: Scholarship&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Need&lt;/span&gt;: A thorough description of the edition in question. This will include authors, titles, physical description, and series information.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Function: Metasearch&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Need&lt;/span&gt;: To be able to combine searches with the same data elements in library catalogs. Generally this means "headings," from the bibliographic record (authors, titles, subject headings).&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Function: Collection development&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Need:&lt;/span&gt; To use GBS to fill in gaps (or make comparisons) in a library's holdings, usually using classification numbers.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Function: Linking to other bibliographic collections or databases&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Need&lt;/span&gt;: Identifiers and headings that may be found in other collections that would allow linking.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Function: Computation&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Need&lt;/span&gt;: Data elements that can mark a text in time and space (date and place of publication), as well as those that can help segment the file, like language. This function also may need to rely on combining editions into groupings of Works, since this research may need to distinguish Works from Manifestations. Computation will most likely use metadata as a controlled vocabulary, and the full text of the work as the "meat" of the research.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-1978269480525436891?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/1978269480525436891/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=1978269480525436891' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1978269480525436891'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/1978269480525436891'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2009/09/google-books-metadata-and-library.html' title='Google Books Metadata and Library Functions'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-8247396957506406883</id><published>2009-09-08T11:59:00.000-07:00</published><updated>2009-09-12T09:57:14.926-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><title type='text'>GBS, according to Amazon</title><content type='html'>When I first read the settlement agreement between Google, the AAP and the Author's Guild, I immediately thought: "Wow. Jeff Bezos must be freaking out!" Because it is obvious that the settlement, as written, sets up a bookselling operation of unprecedented proportions. It also does so in a way that makes it hard if not impossible for any other company to compete in certain areas, particularly in relation to works that are out of print but not out of copyright.&lt;br /&gt;&lt;br /&gt;Amazon has responded to the proposed settlement with a &lt;a href="http://i.i.com.com/cnwk.1d/i/ne/pdfs/amazon_google_books_v2.pdf?tag=mncol;txt"&gt;document for the court.&lt;/a&gt; (The document for Amazon was authored by David Nimmer, known for "&lt;a href="http://www.lexisnexis.com/store/catalog/productdetail.jsp?prodId=10441"&gt;Nimmer on Copyright&lt;/a&gt;", the primary text on the topic of US copyright -- and which sells for over $2,000. When it comes to "big guns" it's hard to get any bigger.) The document makes four major points relating to the settlement. I will paraphrase them here, but if you have an interest in what Amazon has to say you must read the document yourself, because my analysis undoubtedly reflects my non-expert reading of it.&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;The settlement should be rejected because it makes changes to copyright law that should be decided by Congress, not a lawsuit.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;The settlement should be rejected because the Book Rights Registry that it creates is a cartel of rights holders, and violates anti-trust law.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;The settlement must be rejected because its expropriation of orphan works violates the copyright act.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;The settlement must be rejected because it would release Google from liability of future actions.&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;All of these seem like good arguments to me, but I am especially taken by the fourth one. The Amazon document explains in some detail that class action here is being used to allow future actions that are not part of the complaint.&lt;br /&gt;&lt;blockquote&gt;"A class action settlement can only extinguish claims that arise from the same factual predicate as the class claims.... Future claims for future conduct cannot be released by a settlement agreement because they are not part of the same factual predicate as the purported claims." p. 35&lt;/blockquote&gt;What this says, in my interpretation, is that Google is being taken to court by the AAP and AG because it has, in the past, scanned and OCR'd books that are in copyright without asking permission of the rights holders. Yet, the settlement addresses actions that Google has not yet taken, such as the sale of institutional subscriptions, consumer sales of access to books, and a variety of possible revenue models such as print on demand. This is not redress for violation of rights but a kind of blanket agreement that gives Google rights over the materials for future developments.&lt;br /&gt;&lt;blockquote&gt;"The sale of books or subscriptions to a database of scanned works is conduct in which Google has not yet engaged and, because of criminal sanctions, likely would never engage without a clear license to do so." p. 39&lt;/blockquote&gt;Nimmer's analysis seems to be that this is not appropriate in a lawsuit, and especially one in which members of the class are giving up future rights that cannot even be enumerated. The hypothetical example reads:&lt;br /&gt;&lt;blockquote&gt;"... let us imagine that Google has already scanned &lt;i&gt;Lonesome Dove&lt;/i&gt; and included it in the Google Books Program, that Technology X is invented in 2016, and that Google decides in 2020 to inaugurate widescale expoitation of books via that new technology including &lt;i&gt;Lonesome Dove.&lt;/i&gt; To the extent that author Larry McMurtry objects to that exploitation in 2021 (in the same way that previous litigation contested the scope of his grant of books rights to his publisher in &lt;i&gt;Lonesome Dove&lt;/i&gt; at the dawn of the age of audio books), a dispute may develop between author and publisher. The Settlement Agreement goes out of its way to immunize Google from any liability for copyright infringement under those circumstances." p. 39 footnote 29&lt;/blockquote&gt;I cannot confirm nor dispute this analysis, but there is something very frightening about giving up (or assigning, depending on how you see it) rights for an indefinite future when we have no idea what that future will bring. The Amazon comments have interpreted the settlement as having overly expansive concessions to Google that could have unintended consequences in the future.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-8247396957506406883?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/8247396957506406883/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=8247396957506406883' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8247396957506406883'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8247396957506406883'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2009/09/gbs-according-to-amazon.html' title='GBS, according to Amazon'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-5098461567214788711</id><published>2009-09-07T12:52:00.000-07:00</published><updated>2009-09-07T14:24:10.931-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><title type='text'>GBS and Bad Metadata</title><content type='html'>Ever since Geoffrey Nunberg got up at the Google Books Colloquium at Berkeley on August 28, 2009, and showed the audience how bad the Google Books metadata is (&lt;a href="http://chronicle.com/article/Googles-Book-Search-A/48245/"&gt;Google's Book Search: A Disaster for Scholars&lt;/a&gt;, article in The Chronicle of Higher Education, &lt;a href="http://languagelog.ldc.upenn.edu/myl/GoogBookSM.pdf"&gt;Google Books: The Metadata Mess&lt;/a&gt;, the slide presentation from the Conference at UC Berkeley), some parts of the academic world have been buzzing about the topic.&lt;br /&gt;&lt;br /&gt;Google representatives claim that their data comes from libraries and from other sources, but it is easy to show that Google is not including the library's bibliographic record in GBS. It might just be seen as a short-sighted decision on their part not to keep all of the data from the MARC records supplied by the libraries. After all, which of these do you think makes the most sense to the casual reader:&lt;br /&gt;&lt;blockquote&gt;12 pages&lt;br /&gt;12 p. 27 cm.&lt;br /&gt;&lt;/blockquote&gt;However, there is some evidence that Google is missing parts of the library bibliographic record.  Here are some examples of subjects from GBS and the records from the very libraries that supplied the works:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;GBS:&lt;/span&gt;&lt;br /&gt; Indians of North America&lt;br /&gt; Indian baskets&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Library:&lt;/span&gt;&lt;br /&gt; Indians of North America -- Languages.&lt;br /&gt; Indians of North America -- California&lt;br /&gt; Indian baskets -- North America&lt;br /&gt;&lt;br /&gt;This is the same pattern that appeared in the records released by the  University of Michigan for their public domain scanned books -- only the  $a of the 6XX field was included. (I wrote about this:  &lt;a class="moz-txt-link-freetext" href="http://kcoyle.blogspot.com/2008/05/amputation.html"&gt;http://kcoyle.blogspot.com/2008/05/amputation.html&lt;/a&gt;). Many other fields  are also excluded from those Michigan records, and one has to wonder if the same was true of the records received/used by Google.&lt;br /&gt;&lt;br /&gt;I know that it is possible to retrieve the full library records for the books because the Open Library is using this technique to retrieve bibliographic data  for the public domain books scanned by Google. Google is obviously capable of doing this, yet chooses not to.&lt;br /&gt;&lt;br /&gt;This leaves us with a bit of a mystery, although I think I know the answer. The mystery is: why would Google only use limited metadata from the participating libraries? And why won't they answer the question that I asked at the Conference: "Do you have  a contract with OCLC? And does it restrict what data you can use?" Because if the answer is "yes and yes" then we only have ourselves (as in "libraries") to blame. And Nunberg and his colleagues should be furious at us.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-5098461567214788711?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/5098461567214788711/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=5098461567214788711' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5098461567214788711'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/5098461567214788711'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2009/09/gbs-and-bad-metadata.html' title='GBS and Bad Metadata'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-8276718883147987659</id><published>2009-08-23T13:55:00.000-07:00</published><updated>2009-08-25T07:54:41.471-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><title type='text'>Googlebooks: Innovation and the Future of the Book</title><content type='html'>There's a standard joke about a restaurant-goer who complains afterward: "The food was terrible... and there was so little of it!"&lt;br /&gt;&lt;br /&gt;I'm reminded of this while reading the &lt;a href="http://graphics8.nytimes.com/packages/pdf/business/googlebooksearchsettlement.pdf"&gt;letter&lt;/a&gt; by the University of California faculty to the judge in the Google/AAP settlement case. First they argue that the class represented by the Author's Guild does not include academics, who are major, if not &lt;a href="http://kcoyle.blogspot.com/2009/08/academic-publishing-as-percentage-of.html"&gt;&lt;span style="font-style: italic;"&gt;the &lt;/span&gt;major&lt;/a&gt;, producers of authored texts. Then they state their three primary concerns:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;to maximize access, prices should be reasonable&lt;/li&gt;&lt;li&gt;there must be provision for open access choices for authors who want to maximize access to their works&lt;br /&gt;&lt;/li&gt;&lt;li&gt;user privacy must be guaranteed&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;I find it unfortunate that the faculty chose to lead with the question of price... it sounds a bit like "This settlement is seriously flawed... and we might not be able to afford it!" The other two concerns suggest important modifications to the settlement as currently written.&lt;br /&gt;&lt;br /&gt;All three of these concerns are premised on the acceptance of the settlement. There is another, perhaps more serious concern that isn't here, although it may not have been possible for this group because it could be incompatible with basic premises of the settlement. That concern is the question of INNOVATION.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;Innovation&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We may be at a crossroads in the evolution of the book, one that could change forever how one goes about the acts of scholarship and knowledge creation. What happens when some portion of our previously analog texts is available digitally? What changes take place to the nature of research? What are the potential &lt;a href="http://kcoyle.blogspot.com/2007/03/unintended-consequences.html"&gt;unintended consequences&lt;/a&gt;?&lt;br /&gt;&lt;br /&gt;We don't know the answers to these questions, in part because they are about the future. It is probably safe to assume, however, that the future of the book is not a linear progression from where we are today, but that it could go in a number of different directions, ending up somewhere that we can't even imagine at this moment in time. To get there, we must experiment, we must innovate.  There will be trial and error. There may not be a single "killer app." Above all, the change will make use of technology but it will essentially be a cultural change. Perhaps a massive cultural change.&lt;br /&gt;&lt;br /&gt;Some commentators have said that the production of texts digitally makes books irrelevant, that the stable, book-length text will cease to exist as we know it today.  Instead, we may be returning to our medieval roots, before texts were fixed by the repetitive nature of the printing process.   [1]  Others see digitization of previously analog books as a way to reassert the impact of the last 500 years of thought on scholarship. It is easy to imagine the discovery of long-hidden gems from the stacks of university libraries, just as it is easy to imagine being overwhelmed by marginal and irrelevant retrievals. The main thing is that we won't know until it happens.&lt;br /&gt;&lt;br /&gt;We can have some fun speculating on the types of things that we may be able to do with these previously analog texts in the digital future: integrating these texts with those that were 'born digital;" creating hyperlinks between them (one's own personal Memex); recombining texts into new text; annotating and commenting in a public or semi-public sphere; mashing up text with sound and video and data. On a larger scale, we face the possibility of global topic maps that will show us previously undiscovered connections.&lt;br /&gt;&lt;br /&gt;Which of these possibilities will be available to us, however, is up to Google, because under this settlement only Google has the right to innovate across the body of digitized texts. The rest of us, including the faculty of the universities whose libraries are providing the books, are merely consumers. We can buy the product, or not buy the product, but the raw materials, the digital copies of the library texts, belong to Google. The monetizing of the texts is the job of the Book Rights Registry (BRR) that will be formed, which will represent the rights holders. Google's job is to provide the product that will make monetization possible. Both of these entities, Google and the BRR, are focused on the price issue as their main concern. In that, they have much in common with the University of California faculty.&lt;br /&gt;&lt;br /&gt;This is a perfect example of how the asking of a question shapes reality. The question surrounding the settlement is: &lt;span style="font-style: italic;"&gt;are authors (as defined by the Author's Guild) served by the Google/AAP settlement -- yes or no?&lt;/span&gt; The bigger question, &lt;span style="font-style: italic;"&gt;What is the future of the book in our civilization?&lt;/span&gt; is not on the table. Yet, in the end, that may be the question that is answered by this settlement, whether that outcome serves authors or not.&lt;br /&gt;&lt;br /&gt;[1] Recommended reading: &lt;span style="font-style: italic;"&gt;The Future of the Book&lt;/span&gt; (Berkeley: University of California Press, 1996).&lt;span class="Z3988" title="url_ver=Z39.88-2004&amp;amp;ctx_ver=Z39.88-2004&amp;amp;rft_id=urn%3Aisbn%3A0520204506&amp;amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&amp;amp;rft.genre=book&amp;amp;rft.btitle=The%20Future%20of%20the%20Book&amp;amp;rft.place=Berkeley&amp;amp;rft.publisher=University%20of%20California%20Press&amp;amp;rft.aufirst=Geoffrey&amp;amp;rft.aulast=Nunberg&amp;amp;rft.au=Geoffrey%20Nunberg&amp;amp;rft.date=1996&amp;amp;rft.isbn=0520204506"&gt; &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;[Note: I am aware that there are serious issues of copyright law in all of this; that it's not just a question of technology. Whether or not this settlement helps or hinders the evolution of copyright law is a discussion better left to those with a legal background. But it is an active part of the discussion around the settlement.]&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-8276718883147987659?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/8276718883147987659/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=8276718883147987659' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8276718883147987659'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/8276718883147987659'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2009/08/googlebooks-innovation-and-future-of.html' title='Googlebooks: Innovation and the Future of the Book'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-7484168866253940400</id><published>2009-08-23T10:20:00.000-07:00</published><updated>2009-08-24T07:05:09.492-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='googlebooks'/><title type='text'>Academic publishing as a percentage of Google Books</title><content type='html'>A group representing University of California faculty have expressed their concerns about the Google/AAP settlement in a &lt;a href="http://graphics8.nytimes.com/packages/pdf/business/googlebooksearchsettlement.pdf"&gt;letter&lt;/a&gt; to the presiding judge. In that letter they state one of their concerns as:&lt;br /&gt;&lt;blockquote&gt;"Specifically, we are concerned that the Authors Guild negotiators likely prioritized maximizing profits over maximizing public access to knowledge, while academic authors would have reversed those priorities."&lt;/blockquote&gt;The next sentence says:&lt;br /&gt;&lt;blockquote&gt;"We note that the scholarly books written by academic authors constitute a much more substantial part of the Book Search corpus than the Authors Guild members’ books."&lt;/blockquote&gt;I was disappointed that they didn't include any data to support the "substantial part" statement, and think that their letter would have been stronger if they had. (I am presuming that they meant "substantial" in a quantitative way, rather than qualitative. The latter would be hard to support.)&lt;br /&gt;&lt;br /&gt;Edward Betts of the Open Library did an experiment in identifying publishers in the OL data. Because the way publisher names are recorded in bibliographic data, he used ISBN publisher prefixes, where available, to bring together different forms of the name. He posted his results on the &lt;a href="http://blog.openlibrary.org/2009/07/20/isbn-publisher-codes"&gt;OL blog.&lt;/a&gt; The post links to his files.  His data shows counts for each (presumed) individual publisher.&lt;br /&gt;&lt;br /&gt;I mentioned in a comment to Edward's blog post that it was interesting to me that a university press (Oxford UP) turned up in the #1 spot as the publisher with the greatest number of books in the OL. As a matter of fact, out of the top 20 publishers, five are university presses (UPs), and they make up over 1/4 of the books in that group. (&lt;a href="http://kcoyle.net/temp/isbns.txt"&gt;Download&lt;/a&gt; a tab-delimited, ranked version of the data, but be sure to look at Edward's detailed data to understand what makes up each publisher entry.)&lt;br /&gt;&lt;blockquote&gt;# of books records published by the top 20 publishers: &lt;span style="font-weight: bold;"&gt;1,935,327&lt;/span&gt;&lt;br /&gt;# of books in the top 20 by University Presses: &lt;span style="font-weight: bold;"&gt;577,323&lt;/span&gt;&lt;/blockquote&gt;Out of the top 100, the UPs make up a little less than 25% of the file. I'm only including those presses with "University" in their names, meaning that the figure doesn't include Academic Press, Elsevier, Scholastic, etc., which primarily publish the output of academic writers.&lt;br /&gt;&lt;br /&gt;This study of OL publisher data was just experimental, so these figures should be taken with a grain of salt. However, this shows that there is an interesting study to be done, if it can be done, quantifying the relative roles of academic and commercial publishing. Given that Google is digitizing books in university libraries, the tendency toward academic publications should be quite strong. (Note that OL has taken its&lt;a href="http://openlibrary.org/about/help"&gt; records&lt;/a&gt; from the Library of Congress, online book sales, and some libraries, and probably is less heavy on academic presses than Google Books will be.)&lt;br /&gt;&lt;br /&gt;The UC faculty's concern that the interests of academic writers are not well-served by the Author's Guild is compelling to me. I hope the judge takes it seriously.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3338174527262061848-7484168866253940400?l=kcoyle.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://kcoyle.blogspot.com/feeds/7484168866253940400/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3338174527262061848&amp;postID=7484168866253940400' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7484168866253940400'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3338174527262061848/posts/default/7484168866253940400'/><link rel='alternate' type='text/html' href='http://kcoyle.blogspot.com/2009/08/academic-publishing-as-percentage-of.html' title='Academic publishing as a percentage of Google Books'/><author><name>Karen Coyle</name><uri>http://www.blogger.com/profile/02519757456533839003</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3338174527262061848.post-1139818556883315524</id><published>2009-08-20T16:01:00.000-07:00</published><up
