(still) nothing clever

Schema usage in the BTC2010 data

A little while back I spent about 1 CPU week computing which hosts use which namespaces in the BTC2010 data, i.e. I computed a matrix with hosts as rows, schemas as columns and each cell the number of triples using that namespace each host published. My plan was to use this to create a co-occurrence matrix for schemas, and then use this for computing similarities for hierarchical clustering. And I did. And it was not very amazing. Like Ed Summer’s neat LOD graph I wanted to use Protovis to make it pretty. Then, after making one version, uglier than the next I realised that just looking at the clustering tree as a javascript datastructure was just as useful, I gave up on the whole clustering thing.

Not wanting spent CPU hours go to waste, I instead coded up a direct view of the original matrix, getting a bit carried away I made a crappy non-animated, non-smooth version of Moritz Stefaner’s elastic lists using jquery-ui’s tablesorter plugin.

At http://gromgull.net/2010/10/btc/explore.html you can see the result. Clicking one a namespace will show only hosts publishing triples using this schema, and only schemas that co-occur with the one you picked. Conversely, click on a host will show the namespaces published by that host, and only hosts that use the same schemas (this makes less intuitive sense for hosts than for namespaces). You even get a little protovis histogram of the distribution of hosts/namespaces!

The usually caveats for the BTC data applies, i.e. this is a random sampling of parts of the semantic web, it doesn’t really mean anything :)

Posted by gromgull at 1:39 pm on October 12th, 2010. 3 comments... »
Categories: Billion Triple Challenge, Statistics, Visualisation.

3 comments.

[…] This post was mentioned on Twitter by stefano bertolo, Gunnar Grimnes. Gunnar Grimnes said: What hosts use your schema in the Bilion Triple data? My last effort this year, text: http://goo.gl/IZHM and result: http://goo.gl/9azY […]

Posted by Tweets that mention (still) nothing clever — Schema usage in the BTC2010 data -- Topsy.com on October 13th, 2010.
Nice visualizations, how did you come across Moritz Stefaner?

Posted by Jörn on October 14th, 2010.
Hmm – either through http://infosthetics.com/, or through the Mace project and Martin Memmel, or perhaps through Stefano Bertolo on twitter.

You know him?

Posted by gromgull on October 17th, 2010.

Schema usage in the BTC2010 data

3 comments.

Post a comment.

Categories

Archives

Feeds