Thanks to some projects that gather statistics on the growth of linked data, we can find out various interesting things about the vocabularies being used and the degree of linking between data sets from different communities. The data I report here comes from LODstats via the Linked Open Vocabularies (LOV) project.
The LOV project looks particularly at the interrelations between vocabularies. For example, it can show which vocabularies use terms from other vocabularies. This crossover of terms is one of the things that makes links between datasets possible. For example, this shows that the geoSpecies vocabulary is not itself referenced by other vocabularies, but can link through its use of vocabularies like FOAF and Dublin Core. You can watch the visualization grow here.
In contrast, this is what Dublin Core terms looks like at LOV:
With the animated visualization here.
Dublin Core does seem to have fulfilled its role as a core vocabulary that many different communities have found useful, at least in part. The set of terms often abbreviated as "dcterms" (or sometimes "dct") and whose namespace is http://purl.org/dc/terms/ has been used approximately 192 million times as reported in the LOD statistics. This is only the usage in the 2289 linked data datasets used by that project. The earlier set of Dublin Core terms, the original fifteen terms, whose namespace is http://purl.org/dc/elements/1.1/, has been used 24.2 million times. This gives us a total of 216 million uses of Dublin Core in this particular count.
The interesting question, then, is what parts of DC are heavily used? I have a sorted list, from most to least, of all terms in the http://purl.org/dc/ namespace. The top fifteen terms are all from the "dcterms" namespace:
count term
24147876 subject
22575133 identifier
17120343 title
17065873 issued
14459601 publisher
11605978 language
9930733 medium
9795117 format
9792064 BibliographicResource
7700745 isPartOf
7371553 creator
7241777 contributor
6590791 description
6184994 type
5983236 extent
Of this list, only four were not part of the original "Dublin Core 15" vocabulary: issued, medium, BibliographicResource, and isPartOf. The terms of that original vocabulary cluster together beginning right after the last term in the above list. I believe this provides an interesting affirmation that the original fifteen terms were a fair definition of "core."
However, these terms, in the "dcterms" namespace got less than ten uses, and some were even zero:
accrualPeriodicity
Frequency
AgentClass
dateSubmitted
isRequiredBy
Jurisdiction
LicenseDocument
LinguisticSystem
MediaType
MediaTypeOrExtent
PeriodOfTime
PhysicalResource
RightsStatement
The last term, which got zero in the LOD calculations, is particularly interesting because the element "rights" in the original "DC 15" got 398,361 uses, and is ranked 39th in the list of elements the overall http://purl.org/dc namespace.
Next, I'll take a quick look at which datasets are contributing to the use of Dublin Core terms, and who is creating those datasets.
No comments:
Post a Comment
Comments are moderated, so may not appear immediately, depending on how far away I am from email, time zones, etc.