This shows some strings positioned on a graph according to their Levenshtein string distance. (And let me note how appropriate it is that a man named *Levenshtein* should make a string distance algorithm)

The mapping from the distance matrix given by the string distance is then mapped to two dimensions using the FastMap algorithm by Christos Faloutsos and King-Ip Lin. This mapping can be also done with Multidimensional Scaling, (and I did so in my PhD work, I even claimed it was novel, but actually I was 40 years too late, oh well), but that algorithm is nasty and iterative. FastMap, as the name implies, is much faster, it doesn’t actually even need a full distance matrix (I think). It doesn’t always find the best solution, and it has a slight random element to it, so the solution might also vary each time it’s run, but it’s good enough for almost all cases. Again, I’ve implemented it in python – grab it here: fastmap.py

To get a feel for how it works, download the code, remove *George Leary* and see how it reverts to only one dimension for a sensible mapping.

The algorithm is straight-forward enough, for each dimension you want, repeat:

- use a heuristic to find the two most distant points
- the line between these two points becomes the first dimension, project all points to this line
- recurse :)

The distance measure used keeps the projection “in mind”, so the second iteration will be different to the first.The whole thing is a bit like Principal Component Analysis, but without requiring an original matrix of feature vectors.

This is already quite old, from 1995, and I am sure something better exists now, but it’s a nice little thing to have in the toolbox. I wonder if it can be used to estimate numerical feature values for nominal attributes, in cases where all possible values are known?

[…] based on what predicates is used within each one. Then once I had the distance-metric, I could use FastMap to visualise it. It would be a quick hack, it would look smooth and great and be fun. In the end, […]

Posted by (still) nothing clever — Visualising predicate usage on the Semantic Web on September 9th, 2009.

Working like a charm :D

i generated a few string signatures of my results to check if every result was different and it worked like a charm

Thanks !

Posted by Galadrim on April 21st, 2012.