(still) nothing clever

new site, new blog

I have been using http://gromgull.net as a hub for linking to my many online identities and also as an open-id delegate URL, but it was never a proper host, it just redirected to http://semikolon.co.uk/gromgull. This redirection was getting annoying, as different OpenID clients would save the openID url differently, some would use what I typed, i.e. http://gromgull.net, some would use what this redirected to, and some would use the myopenid ID I actually delegated to. When the RSS feed of my (now old) phpsimpleblog also broke it was time to upgrade properly. So here we go with a new domain and a wordpress installation. Well worth it for the two times a year I blog. (All the old posts have been moved here, with lots of hassle, but old comments have not. )

In other news I was at ESWC09, organising SFSW09 as usual. All great fun, the Scripting challenge at SFSW had especially high quality entries this year, the winners are listed at the challenge page, and I would also recommend watching the screencast for Anca Luca’s Practical Semantic Works – a Bridge from the Users’ Web to the Semantic Web – although she did not win, she shows some amazing presentation skills! The whole experience is also documented on flickr.

Finally, ESWC brought the Billion Triple Challenge to my attention, and I wondered if I could possible do some data-mining of some sort on this data. Downloading it I quickly realised that it will not fit into any RDF database that I keep lying around, but since the data is in a nice 1-triple per line N-QUADS format, I can process it with commandline tools, like awk, sed, sort and friends. I promptly set to work, writing scripts for extracting literals (since they make the commandline processing trickier) and sorting and counting like mad. A week of CPU time later I realise that something is amiss, I have predicates that are simply “and”, and subject URI that are <file://Documents … bugger. As it turns out the data-set had some bugs features, like URLs with spaces in them and I’ve had to rewrite my script. Once it works the details will appear here. (Andreas Harth agrees that this is a feature btw, and a new version of the BTC dataset will appear later)

Posted by gromgull at 3:37 pm on June 16th, 2009. No comments... »
Categories: Uncategorized.

new site, new blog

Post a comment.

Categories

Archives

Feeds