Google Normalized Distance

Sometimes statistical approach simply wins over science trying to model real world. Here is an example of that. Google Normalized distance finds the relatedness between two words/concepts. this is based on number of results that comes when two are searched together to what when they are searched individually. http://en.wikipedia.org/wiki/Semantic_relatedness

If you are bound by thinking in java. Here is a good implementation of the same. Use this jar (http://www2.informatik.hu-berlin.de/~hakenber/publ/suppl/smbm06/WBI-TM.jar) to make your app finding semantic relatedness between two concepts.

few results

result for Agra & Taj Mahal: 0.37951525964462646
result for Agra & Delhi: 0.43014626260551725

Lower the score more the semantic relatedness between those concepts.

Note:

If you are working behind a firewall. you might need following properties to be set before you go through.

System.setProperty(“http.proxyHost”, “yourproxy”);
System.setProperty(“http.proxyPort”, “yourport”);

WordSenseGoogler wordSense = new WordSenseGoogler();

System.out.println(“result for Agra & Taj Mahal: ” + wordSense.getNormalizedGoogleDistance(“Agra”, “TajMahal”));
System.out.println(“result for Agra & Delhi: ” + wordSense.getNormalizedGoogleDistance(“Agra”, “Delhi”));

Advertisements