(I fully admit that this topic deserves a much more extensive treatment than I will give it here. My goal is to stimulate discussion that would lead to efforts to develop models of that catalog that support a better user experience.)
Recognizing that users need a way to make sense out of large result sets, some library catalogs have added features that attempt to provide context for the user. The main such effort that I am aware of is the presentation of facets derived from some data in the bibliographic records. Another model, although I haven't seen it integrated well into library catalogs, is data mining; doing an overall analysis combining different data in the records, and making this available for search. Lastly, we have the development of entities that are catalog elements in their own right; this generally means the treatment of authors, subjects, etc., as stand-alone topics to be retrieved and viewed, even apart from their role in the retrieval of a bibliographic item. Treating these as "first-class entities" is not the same as the heading layer over bibliographic records, but it may be exploitable to provide a kind of context for users.
Facets
Faceted classification was all the rage when I attended library school in the early 1970's, having been bolstered by the work of the UK-based Classification Research Group, although the prime mover of this type of classification was S R Ranganathan who thoroughly explicated the concept in the 1930s. Faceted classification was to 1970's knowledge organization what KWIC and KWOC were to text searching: facets potentially provided a way to create complex subject headings whose individual parts could be the subject of access on their own or in context.In library systems "faceting" has exploited information from the bibliographic record can be discretely applied to a retrieved set. Facets are all "accidents" of the existing data, as catalog record creation is not based on faceted cataloging.
In general, facets are fixed data elements, or whole or part heading strings. Authors are used as facets, generally showing the top-occurring author names with counts.
Authors as facets |
Date of publication is also a commonly used facet, not so much because it is inherently useful but mainly because "it exists."
Dates as facets |
Subject Facets
Faceting is, to a degree, already incorporated into our subject access systems. Library of Congress subject headings are faceted to some extent, with topic facets, geographic facets, and time facets. The Library of Congress Classification and the Dewey Decimal Classification make some use of facets where they allow entries in the classification to be extended by place, time, or other re-usable subdivisions.
Summon system use of facets |
The Open Library created subject facets from Library of Congress subject headings, and categorizes each by its facet "type":
Open Library subject facts |
Although these are laudable attempts to give the user a way to both understand and further refine the retrieved set, there are a number of problems with these implementations, not the least of which is that many of these are not actually facets in the knowledge organization sense of that term. Facets need to be conceptual divisions of the landscape that help a user understand that landscape.
Online sales sites use something that they call faceted classification, although it varies considerably from the concept of faceted classification that originated with S. R. Ranganathan in the 1930's. On a sales site, facets divide the site's products into categories so that users can add those categories to their searches. A search for shoes in general is less useful than a search for shoes done under the categories "men's", "women's" or "children's". In the online sales sense, a facet is a context for the keyword search. For all that the overall universe that these facets govern is much simpler than the entire knowledge universe that libraries must try to handle, at least the concept of context is employed to help the user.
Amazon's facets |
The last problem is really the key here, which is that while isolated bits of data like date or place may help narrow a large result set they do not provide the kind of overall context for searches that a truly faceted system might. However, providing such a view requires that the entries in the library catalog have been classified using a faceted classification system, and that is simply not the case.
Data Mining
I include this because I think it is interesting, although the only real instances of it that I am aware of come from OCLC, which is uniquely positioned to do the kind of "big data" work that this implies. The WorldCat Identities project shows the kind of data that one can extract from a large bibliographic database. Data mining applies best to the bibliographic universe as a whole, rather than individual catalogs, since those latter are by definition incomplete. It would, however, be interesting to see what uses could be made of mined data like WorldCat Identities, for example giving users of individual catalogs information about sources that the library does not hold. It is also a shame that WorldCat Identities appears to have been a one-off and is not being kept up to date.
Emily Dickinson at WorldCat Identities |
First Class Objects
A potential that linked data brings (but does not guarantee) is the development of some of the key bibliographic entities into "first class objects". By that I mean that some entities could be the focus of searches on their own, not just as index entries to bibliographic records. Having some entities be first class objects means that, for example, you can have a page for a person that is truly about the person, not just a heading with the personal name in it. This allows you to present the user with additional information, either similar to WorldCat Identities, if you have that information available to you, or taking text from sources like Wikipedia, like Open Library did:Open Library author page |
This was also the model used in the linked data database Freebase (which has now been killed by Google), and is not entirely unlike Google's use of Wikipedia (and other sources) to create its "knowledge graph."
Google Knowledge Graph |
The treatment of some things as first class objects is perhaps a step toward the catalog of headings, but the person as an object is not itself a replication of the heading system that is found in bibliographic records, which go beyond the person's name in their organizational function:
For subject headings, a key aspect of the knowledge map is the inclusion of relationships from broader and narrower terms and related terms. I will not pretend that the existing headings are perfect, as we know they are not, but it is hard to imagine a knowledge organization system that will not make use of these taxonomic concepts in one way or another.Dickens, Charles, 1812-1870--Adaptations. Dickens, Charles, 1812-1870--Adaptations--Comic books, strips, etc. Dickens, Charles, 1812-1870--Adaptations--Congresses. Dickens, Charles, 1812-1870--Aesthetics. Dickens, Charles, 1812-1870--Anecdotes. Dickens, Charles, 1812-1870--Anniversaries, etc. Dickens, Charles, 1812-1870--Appreciation. Dickens, Charles, 1812-1870--Appreciation--Croatia.
This information is now available through the Library of Congress linked data service, id.loc.gov and surely, with some effort, these aspects of the "first class entity" (person, place, topic, etc.) could be recovered and made available to the user. Unfortunately (how often have I said that in these posts?), the subject heading authorities were designed as a model for subject heading creation, not as a full list of all possible subject headings, and connecting the authority file, which contains the relationships between terms, mechanically to the headings in bibliographic records is not a snap. Again, what was modeled for the card catalog and worked well in that technology does not translate perfectly to the newer technologies.Lake Erie See: Erie, Lake Lake Erie, Battle of, 1813. BT:United States--History--War of 1812--Campaigns Lake Erie, Battle of, 1813--Bibliography. Lake Erie, Battle of, 1813--Commemoration. Lake Erie, Battle of, 1813--Fiction. Lake Erie, Battle of, 1813--Juvenile fiction. Lake Erie, Battle of, 1813--Juvenile literature. Lake Erie Transportation Company≈ See Also: Erie Railroad Company.
Note that the emphasis on bibliographic entities in FRBR, RDA and BIBFRAME could facilitate such a solution. All three encourage an entity view of data that has traditionally included in bibliographic records and that is not entirely opposed to the concept of the separation of bibliographic data and authorities. In addition, FRBR provides a basis for conceptualizing works and editions (FRBR's expression) as separate entities. These latter exist already in many forms in the "real world" as objects of critical thinking, description, and point of sale. The other emphasis in FRBR is on bibliographic relationships. This has helped us understand that relationships are important, although these bibliographic relationships are the tip of the iceberg if we look at user service as a whole.
No comments:
Post a Comment
Comments are moderated, so may not appear immediately, depending on how far away I am from email, time zones, etc.