Again Dan Brickley is making me work :) This time looking at the “hidden” schema that is SKOS concepts, (hidden because it is not really apparent when just looking at normal rdf:types). Dan suggested looking at topics used with FOAF, i.e. objects of foaf:topic, foaf:primaryTopic and foaf:interest triples, and also things used with Dublin Core subject (I used both http://purl.org/dc/elements/1.1/subject and http://purl.org/dc/terms/subject.
I found 1,136,475 unique FOAF topics in 8,119,528 triples, only 4,470 are bnodes, and only 265 (! i.e. only 0.002%) are literals. The top 10 topics are all of the type http://www.livejournal.com/interests.bml?int=??????, with varying number of ?s, this is obviously what people entered into the interest field of livejournal. More interesting are perhaps the top hosts:
#triples | host |
---|---|
5,191,771 | www.livejournal.com |
1,819,836 | www.deadjournal.com |
771,439 | www.vox.com |
78,290 | klab.lv |
75,285 | lj.rossia.org |
70,380 | lod.geospecies.org |
18,398 | my.opera.com |
16,251 | dbpedia.org |
11,481 | www.wasab.dk |
9,815 | wiki.sembase.at |
So a lot of these topics are from FOAF exports of livejournal and friends. What I did not do, at least not yet, was to compare the list of FOAF topics with the things actually declared to be of type skos:Concept, this would be interesting.
Dublin Core looks quite different, it gives us 552,596 topics in 4,018,726 triples, but only 2,979 resources out of 921 are bnodes, the rest (i.e. 99.4%) are all literals.
The top 10 subjects according to DC are:
#triples | subject |
---|---|
91,534 | 日記 |
38,566 | 写真 |
35,514 | メル友募集 |
32,150 | NAPLES |
30,973 | business |
28,342 | 独り言 |
27,543 | SoE Report |
24,102 | Congress |
23,954 | 音楽 |
20,097 | 花 |
I do not even know what language most of these are (anyone?). Looking a bit further down the list, there are lots of government, education, crime, etc. Perhaps we can blame data.gov for this? I could have have kept track of the named-graphs these came from, but I didn’t. Maybe next time.
You can download the full raw counts for all subjects: FOAF topics (7.6mb), FOAF hosts and DC Topics (23mb).