Sometimes statistical approach simply wins over science trying to model real world. Here is an example of that. Google Normalized distance finds the relatedness between two words/concepts. this is based on number of results that comes when two are searched together to what when they are searched individually. http://en.wikipedia.org/wiki/Semantic_relatedness
If you are bound by thinking in java. Here is a good implementation of the same. Use this jar (http://www2.informatik.hu-berlin.de/~hakenber/publ/suppl/smbm06/WBI-TM.jar) to make your app finding semantic relatedness between two concepts.
few results
result for Agra & Taj Mahal: 0.37951525964462646
result for Agra & Delhi: 0.43014626260551725
Lower the score more the semantic relatedness between those concepts.
Note:
If you are working behind a firewall. you might need following properties to be set before you go through.
System.setProperty(“http.proxyHost”, “yourproxy”);
System.setProperty(“http.proxyPort”, “yourport”);WordSenseGoogler wordSense = new WordSenseGoogler();
System.out.println(“result for Agra & Taj Mahal: ” + wordSense.getNormalizedGoogleDistance(“Agra”, “TajMahal”));
System.out.println(“result for Agra & Delhi: ” + wordSense.getNormalizedGoogleDistance(“Agra”, “Delhi”));