This is the fourth of my Billion Triple Challenge data-set statistics posts, if you only just got here, catch up on part I, II or III.
I had these numbers ready for a long time, but never found the time to type it up as the it is not so exciting. However CaptSolo asked for it now to put in his very-soon-to-be-finished thesis, so I’ll hurry up. This is all about the classes used in the BTC data, i.e. the rdf:type triples.
Overall the data contains 143,293,758 type triples, assigning 283,815 different types to 104,562,695 different things. For the types themselves:
- 213,281 types are used more than once
- 94,455 used more than 10
- 14,862 more than 100
- 1,730 more than 1000
- 288 more than 10000
If we take only these 288 top ones we cover 92% of all types triples, we can cover 90% of the typed things with only 105 types and over 50% of the data with only foaf:Person, sioc:WikiArticle, rss:Item and foaf:OnlineAccount. Out of all the “types” used 12,319 were BNodes, which is odd, but I guess possible, and 204 are literals, which is even odder. The top 10 types are:
#triples | type URI |
---|---|
1,859,499 | wordnet:Person |
2,309,652 | foaf:Document |
2,645,091 | akt:Article-Reference |
2,680,081 | owl:Class |
5,616,163 | akt:Person |
7,544,797 | geonames:Feature |
12,123,375 | foaf:OnlineAccount |
13,686,988 | rss:item |
14,172,851 | sioc:WikiArticle |
38,790,680 | foaf:Person |
Now for the things the types are assigned to, out of the 104,562,965 things with types, 52,865,376 are BNodes. If you pay attention you will now have realised that many things have more than one type assigned (143M type triples⇒104M things). In fact:
- 7,026,972 things have more than one type triple.
- 612,467 has more than 10
- 35,201 more than 100
- 1,025 more than 1,000
- 40 more than 10,000
Note I am talking here of type triples, i.e. the top 40 things may well have the same type assigned 10,000 times. The things having over 10,000 types assigned is a product of the partially inclusion of inferred triples in the data. For instance, for every context where RDFS inference has been applied, all properties will have rdf:type rdf:Property inferred. Looking at the number of unique types per thing shows that:
- 2,979,968 things have more than one type
- 78,208 have more than 10
- 4 more than 100
The 10 things with most unique types are all pretty boring:
#types | URI |
---|---|
74 | http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000959f60 |
75 | http://dbpedia.org/resource/Arnold_Schwarzenegger |
88 | http://oiled.man.example.net/test#V822576 |
91 | http://oiled.man.example.net/test#V21027 |
91 | http://oiled.man.example.net/test#V21029 |
91 | http://oiled.man.example.net/test#V21030 |
105 | http://oiled.man.example.net/test#V16459 |
136 | http://www.w3.org/2002/03owlt/description-logic/consistent501#T |
136 | http://www.w3.org/2002/03owlt/description-logic/inconsistent502#T |
171 | http://oiled.man.example.net/test#V21026 |
Likewise the 10 things with the most types assigned, all product of materialised inferred triples:
#triples | URI |
---|---|
57,533 | http://sw.opencyc.org/2008/06/10/concept/ |
58,838 | http://semantic-mediawiki.org/swivt/1.0#creationDate |
58,838 | http://semantic-mediawiki.org/swivt/1.0#page |
58,838 | http://semantic-mediawiki.org/swivt/1.0#Subject |
89,521 | http://sw.opencyc.org/concept/Mx4rwLSVCpwpEbGdrcN5Y29ycA |
121,138 | http://en.wikipedia.org/ |
159,773 | http://sw.opencyc.org/concept/ |
232,505 | http://sw.cyc.com/CycAnnotations_v1#label |
361,113 | http://xmlns.com/foaf/0.1/holdsAccount |
465,010 | http://sw.cyc.com/CycAnnotations_v1#externalID |
That’s it — I hope it changed your life! :)
it did change my life.
Arnold Schwarzenegger has most types! This is amazing. Olaf and I just thought about it:
* he is world class body builder
* he is world class actor
* he is world class politician
* he made a lot of money with real estate
* he married into the kennedy clan
* his life happened in austria and in california, so he doubles as “austrianbodybuilder” and “californianbodybuilder”, which doubles many of his types.
As fellow Austrian, I can also connect to him, excellent.
so – conan rules the semantic web – not Chuck Norris. I am waiting for the day that Chuck Norris has more types than the semantic web.
Posted by Leo Sauermann on October 5th, 2009.