July 2, 2009
CEN, Coimbatore Campus
Amrita’s Center for Computational Engineering and Networking is engaged in research on Computational Linguistics and Natural Language Processing. Recently the Center became part of an MHRD-funded consortium to develop translation tools for Indian languages. In June 2009, the Center conducted a workshop on WordNet Creation for participants from all over the country; many of them also part of the same MHRD-consortium. Eleven language groups were represented, one each for Kashmiri, Nepali, Sanskrit, Hindi, Marathi, Gujarati, Manipuri, Telugu, Kannada, Tamil and Malayalam.
A WordNet is a lexical database, with characteristics of both a dictionary and a thesaurus. A WordNet for the English language was created by the Cognitive Science Laboratory of Princeton University. This is an essential component of any Machine Translation System. The design of this online lexical reference system is inspired by current psycholinguistic and computational theories of human lexical memory. Nouns, verbs, adjectives and adverbs are organized into synonymous sets, each representing one underlying lexicalized concept. Different semantic relations link the synonyms sets.
“The most ambitious feature of a WordNet is the organization of lexical information in terms of word meanings rather than word forms,” stated Dr. Soman, CEN Director. “WordNets are the need of the day. There are so many web sites loaded with information; also daily new websites are added in Indian languages. Indian language WordNets can enable the availability of this web content in Indian languages also.”
Workshop participants representing the different Indian languages presented their research work in an attempt to develop a common framework for creation of WordNets for these languages. The ultimate aim is to link these different WordNets together and make a complete package; this workshop provided a starting point. The workshop was conducted as part of the MHRD-funded project on Creation of Machine Language Tools for Translation from English to Indian Languages.
Dr. Pushpak Bhattacharya, Professor of Computer Science and Engineering from IIT Bombay led the workshop. An acknowledged expert in the areas of Natural Language Processing and Machine Learning, Dr. Bhattacharya is often featured as a key-note speaker in international conferences in these areas. He has served as Visiting Professor at Stanford University in USA and Professeur Invite at Universite Joseph-Fourier in France and currently serves as a resource person for the MHRD project.
Participants represented eleven language groups, one each for Kashmiri, Nepali, Sanskrit, Hindi, Marathi, Gujarati, Manipuri, Telugu, Kannada, Tamil and Malayalam.