Typical Semantic Web Data

This is the fourth of my Billion Triple Challenge data-set statistics posts, if you only just got here, catch up on part I, II or  III.

I had these numbers ready for a long time, but never found the time to type it up as the it is not so exciting. However CaptSolo asked for it now to put in his very-soon-to-be-finished thesis, so I’ll hurry up. This is all about the classes used in the BTC data, i.e. the rdf:type triples.
Overall the data contains 143,293,758 type triples, assigning 283,815 different types to 104,562,695 different things.  For the types themselves:

  • 213,281 types are used more than once
  • 94,455 used more than 10
  • 14,862 more than 100
  • 1,730 more than 1000
  • 288 more than 10000

If we take only these 288 top ones we cover 92% of all types triples, we can cover 90% of the typed things with only 105 types and over 50% of the data with only foaf:Person, sioc:WikiArticle, rss:Item and foaf:OnlineAccount. Out of all the “types” used 12,319 were BNodes, which is odd, but I guess possible, and 204 are literals, which is even odder. The top 10 types are:

#triples type URI
1,859,499 wordnet:Person
2,309,652 foaf:Document
2,645,091 akt:Article-Reference
2,680,081 owl:Class
5,616,163 akt:Person
7,544,797 geonames:Feature
12,123,375 foaf:OnlineAccount
13,686,988 rss:item
14,172,851 sioc:WikiArticle
38,790,680 foaf:Person

Now for the things the types are assigned to, out of the 104,562,965 things with types, 52,865,376 are BNodes. If you pay attention you will now have realised that many things have more than one type assigned (143M type triples⇒104M things). In fact:

  • 7,026,972 things have more than one type triple.
  • 612,467 has more than 10
  • 35,201 more than 100
  • 1,025 more than 1,000
  • 40 more than 10,000

Note I am talking here of type triples, i.e. the top 40 things may well have the same type assigned 10,000 times. The things having over 10,000 types assigned is a product of the partially inclusion of inferred triples in the data. For instance, for every context where RDFS inference has been applied, all properties will have rdf:type rdf:Property inferred. Looking at the number of unique types per thing shows that:

  • 2,979,968 things have more than one type
  • 78,208 have more than 10
  • 4 more than 100

The 10 things with most unique types are all pretty boring:

#types URI
74 http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000959f60
75 http://dbpedia.org/resource/Arnold_Schwarzenegger
88 http://oiled.man.example.net/test#V822576
91 http://oiled.man.example.net/test#V21027
91 http://oiled.man.example.net/test#V21029
91 http://oiled.man.example.net/test#V21030
105 http://oiled.man.example.net/test#V16459
136 http://www.w3.org/2002/03owlt/description-logic/consistent501#T
136 http://www.w3.org/2002/03owlt/description-logic/inconsistent502#T
171 http://oiled.man.example.net/test#V21026

Likewise the 10 things with the most types assigned, all product of materialised inferred triples:

#triples URI
57,533 http://sw.opencyc.org/2008/06/10/concept/
58,838 http://semantic-mediawiki.org/swivt/1.0#creationDate
58,838 http://semantic-mediawiki.org/swivt/1.0#page
58,838 http://semantic-mediawiki.org/swivt/1.0#Subject
89,521 http://sw.opencyc.org/concept/Mx4rwLSVCpwpEbGdrcN5Y29ycA
121,138 http://en.wikipedia.org/
159,773 http://sw.opencyc.org/concept/
232,505 http://sw.cyc.com/CycAnnotations_v1#label
361,113 http://xmlns.com/foaf/0.1/holdsAccount
465,010 http://sw.cyc.com/CycAnnotations_v1#externalID

That’s it — I hope it changed your life! :)

One comment.

  1. it did change my life.

    Arnold Schwarzenegger has most types! This is amazing. Olaf and I just thought about it:
    * he is world class body builder
    * he is world class actor
    * he is world class politician
    * he made a lot of money with real estate
    * he married into the kennedy clan
    * his life happened in austria and in california, so he doubles as “austrianbodybuilder” and “californianbodybuilder”, which doubles many of his types.

    As fellow Austrian, I can also connect to him, excellent.

    so – conan rules the semantic web – not Chuck Norris. I am waiting for the day that Chuck Norris has more types than the semantic web.

Post a comment.