Google Normalized Distance

Sometimes statistical approach simply wins over science trying to model real world. Here is an example of that. Google Normalized distance finds the relatedness between two words/concepts. this is based on number of results that comes when two are searched together to what when they are searched individually. http://en.wikipedia.org/wiki/Semantic_relatedness

If you are bound by thinking in java. Here is a good implementation of the same. Use this jar (http://www2.informatik.hu-berlin.de/~hakenber/publ/suppl/smbm06/WBI-TM.jar) to make your app finding semantic relatedness between two concepts.

few results

result for Agra & Taj Mahal: 0.37951525964462646
result for Agra & Delhi: 0.43014626260551725

Lower the score more the semantic relatedness between those concepts.

Note:

If you are working behind a firewall. you might need following properties to be set before you go through.

System.setProperty(“http.proxyHost”, “yourproxy”);
System.setProperty(“http.proxyPort”, “yourport”);

WordSenseGoogler wordSense = new WordSenseGoogler();

System.out.println(“result for Agra & Taj Mahal: ” + wordSense.getNormalizedGoogleDistance(“Agra”, “TajMahal”));
System.out.println(“result for Agra & Delhi: ” + wordSense.getNormalizedGoogleDistance(“Agra”, “Delhi”));

Advertisements

SPARQL Query Generation from NL

I been searching for this utility on Google & Delicious. But sometimes these crawlers & taggers fail to find information which a human being can do if they have patience to read a paper, which others have written.

After a failed effort on writing a partial query converter on my own, today i found a paper which mentions few good tools which can do this for me.

If you are also looking for any such requirement. please visit following links. AquaLog is for people who think in java.

http://technologies.kmi.open.ac.uk/aqualog/

http://alumni.media.mit.edu/~mueller/papers/tt.html