Qualification: 
Ph.D, MSc, BA
g_deepa@blr.amrita.edu

Dr. Deepa Gupta received the bachelor degree in Mathematics from Delhi University (India) with honors in 1997, the Master in Mathematics from Indian Institute of Technology Delhi (IITD), India in 1999, and the Ph.D. in Natural Language Processing from Department of Mathematics and computer application from IIT Delhi in 2005. She worked as a Postdoc researcher at FBK-IRST (Center for Scientific and Technological Research), Trento, Italy from February 2005 to March 2007. She worked as an Assistant Professor at IIIT Bangalore, India from April 2007 to December 2007. And from 2009 to July 2014, she is associated with Amrita Vishwa Vidyapeetam as Assistant Professor in the department of Mathematics. From July 2014, she is working as an associate professor in the same institute. Her research interest include Natural Language Processing, Statistical Machine Translation, Data Mining, Text analytics , Machine Learning etc.

Qualification

Degree University Year Subjects
B. A.(Hons) Delhi University 1997 Mathematics
M. Sc. Indian Institute of Technology, Delhi (IITD) 1999 Applied Mathematics
Ph. D. Indian Institute of Technology, Delhi (IITD) 2005 Natural Language Processing

FUNDED RESEARCH PROJECTS

Publications

Publication Type: Conference Paper

Year of Publication Publication Type Title

2018

Conference Paper

M. Sundar Kar K., K. Jeeva Priya, and Dr. Deepa Gupta, “Analysis of Digit Recognition in Kannada Using Kaldi Toolkit”, in Third International Conference on Emerging Research in Electronics, Computer Science & Technology (ICERECT 2018), PES College of Engineering, Mandya, 2018.

2018

Conference Paper

M. P Krishna, R Reddy, P., Narayanan, V., Lalitha, S., and Dr. Deepa Gupta, “Affective state recognition using audio cues”, in International Symposium on Intelligent Systems Technologies and Applications (ISTA 2018), PES Institute of Technology, Bengaluru, South Campus, India, 2018.

2018

Conference Paper

Srilasya, Sahana T., Vinay S., K. Jeeva Priya, and Dr. Deepa Gupta, “Comparison of different acoustic Models for Kannada Language using Kaldi Toolkit”, in 7th IEEE International conference on Advances in Computing, Communications and Informatics (ICACCI), PES Institute of Technology, Bengaluru, South campus, India, 2018.

2017

Conference Paper

K. Ma Shivakumar, Shivaraju, Na, Sreekanta, Vb, and Dr. Deepa Gupta, “Comparative study of factored SMT with baseline SMT For English to Kannada”, in Proceedings of the International Conference on Inventive Computation Technologies, ICICT 2016, 2017, vol. 1.[Abstract]


Dravidian languages are highly agglutinative and morphologically rich in their features. Language processing for these languages requires more annotating data compared to European or Indo-European languages. In this paper we present the comparison between Statistical Machine Translation (SMT) model with linguistic and nonlinguistic data models for English to Kannada languages. The experiments shows an improvement in Bleu-Score for Factored MT system against Baseline MT system for English to Kannada SMT. Kannada fonts can take ten different forms in representing a word any change of a font variant in word leads to change in meaning of the word. We model these morphological variants of Kannada lemma words, their variants and PoS as Factors in our MT System.

More »»

2017

Conference Paper

S. Suresh Shastri, Vivek, P., Dr. Deepa Gupta, Nayar, R. C., Rao, R., and Ram, A., “Breast Cancer Diagnosis and Prognosis using Machine Learning Techniques”, in International Symposium on Intelligent Systems Technologies and Applications (ISTA'17), Manipal University, Karnataka, 2017.

2017

Conference Paper

A. Nair, Dr. Deepa Gupta, Sangita Khare, Gopalkrishna, D. Manippady, and Dr. Amalendu Jyotishi, “Characteristics and causes of malnutrition across Indian states: A cluster analysis based on Indian demographic and health survey data”, in International Conference on Advances in Computing, Communications & Informatics (ICACCI’17), Manipal University, Karnataka , 2017.

2017

Conference Paper

P. Salunkhe, Bhaskaran, S., Amudha, J., and Dr. Deepa Gupta, “Recognition of Multilingual Text from Signage Boards”, in 6th International Conference on Advances in Computing, Communications & Informatics (ICACCI’17), , Manipal University, Karnataka , 2017.

2017

Conference Paper

G. B. R., Dr. Deepa Gupta, and Sasikala T, “Grammar Error Detection Tool for Medical Transcription using Stop Words – POS Tags ngram Based Model ”, in 2nd International Conference on Computational Intelligence and Informatics(ICCI’17), JNTU, Hyderabad , 2017.[Abstract]


Medical transcription is the process of conversion of audio files, dictated by medical experts, to electronic
data files in a predetermined format. The doctor ‘s thoughts are documented, covering medical procedures carried
out on a patient starting from the time the patient enters the clinic or hospital, up until the ailment is treated.
A grammar checker is an important asset to hospitals to scrutinize medical transcripts. The transcripts are important
to track a patient’s medical history and need to be error free. The available existing tools are specifically designed to
detect faulty grammatical constructs in the generic English language. It is important to improve the intelligence of a
grammar checker in a relatively unknown domain and to improve the level of accuracy set by the existing tools which
mostly rely on a set of non-exhaustive rulesets. These are the driving factors to propose a new approach to an old
problem. Stop words are most commonly occurring words in any language. By exploiting the fact that stop words
form the backbone of a sentence and by figuring out the common parts-of-speech tags which surround them,
a sentence’s grammatical structure can be better understood using statistical methods.

More »»

2017

Conference Paper

T. Babu, Dr. Tripty Singh, Dr. Deepa Gupta, and Hameed, S., “Colon Cancer Detection in Biopsy Images for Indian Population at Different Magnification Factors Using Texture Features”, in 9th International Conference on Advanced Computing (ICoAC 2017), MIT, Chennai , 2017.

2017

Conference Paper

S. Bhaskaran, Paul, G., Dr. Deepa Gupta, and Amudha, J., “Langtool: Identification of Indian Language for short Text”, in 9th International Conference on Advanced Computing (ICoAC 2017), MIT, Chennai , 2017.

2017

Conference Paper

Sangita Khare, Dr. Deepa Gupta, K, P., Dr. Deepika M. G., and Dr. Amalendu Jyotishi, “Health and Nutritional Status of Children: Survey, Challenges and Directions”, in 3rd International Conference on Cognitive Computing and information Processing (CCIP 17), JSSATE-B Campus, Bengaluru , 2017.

2016

Conference Paper

D. Mishra, Manju Venugopalan, and Dr. Deepa Gupta, “Context Specific Lexicon for Hindi Reviews”, in Procedia Computer Science, 2016, vol. 93, pp. 554 - 563.[Abstract]


In the era of social networking, immense amount of posts, comments and tweets generated every second are increasing the size of social database. The analysis of this voluminous data is necessary for exploring the orientation of people's opinion about a particular entity. Most of the online data are in English language, but due to increase in technology and improved awareness of people, the online data available in Indian languages are gradually increasing. Sentiment analysis of English language alone is not sufficient to know the inclination of people towards an entity, other Indian language sentiment analysis is a must, their contribution is also important for us. The available sentiment classification lexicon resources like Hindi SentiWordNet are generic in nature and hence results in average sentiment classification accuracy due to contextual dependency. To improve the sentiment classification accuracy, we present an improvised lexicon resource for Hindi language for Hotel and Movie domains. The improvised polarity lexicon has been built reflecting context sensitivity and to increase coverage it has been expanded used synonyms based approach. The built polarity lexicon resource showcases an improvement in accuracy of 42% and 78% in Movie and Hotel domain, respectively, compared to the existing Hindi SentiWordNet lexicon resource. More »»

2016

Conference Paper

K. Vani and Dr. Deepa Gupta, “ASE@DPIL-FIRE2016: Hindi paraphrase detection using natural language processing techniques & semantic similarity computations”, in CEUR Workshop Proceedings, 2016, vol. 1737, pp. 244-249.[Abstract]


The paper reports the approaches utilized and results achieved for our system in the shared task (in FIRE-2016) for paraphrase identification in Indian languages (DPIL). Since Indian languages have a complex inherent nature, paraphrase identification in these languages becomes a challenging task. In the DPIL task, the challenge is to detect and identify whether a given sentence pairs paraphrased or not. In the proposed work, natural language processing with semantic concept extractions is explored for paraphrase detection in Hindi. Stop word removal, stemming and part of speech tagging are employed. Further similarity computations between the sentence pairs are done by extracting semantic concepts using WordNet lexical database. Initially, the proposed approach is evaluated over the given training sets using different machine learning classifiers. Then testing phase is used to predict the classes using the proposed features. The results are found to be promising, which shows the potency of natural language processing techniques and semantic concept extractions in detecting paraphrases.

More »»

2016

Conference Paper

S. Khare and Dr. Deepa Gupta, “Association rule analysis in cardiovascular disease”, in 2016 Second International Conference on Cognitive Computing and Information Processing (CCIP), 2016.[Abstract]


Data mining in healthcare is a rising field due to the vast amount of patient specific data which is freely available for analysis. While the majority of this data has been analyzed using various data mining techniques like classification, but association rule mining in this field is still largely unexplored. Association Rule Mining is a simple yet powerful tool that brings to light hidden relationships among data attributes in addition to statistically validating those which are already known. These relationships can help in understanding diseases and their causes in a better way, which in turn will help to prevent them. This report presents exploration of this field and the conclusions drawn from analyzing heart disease dataset from UCI repository. In this paper association rule mining is applied to cardiovascular disease. Cardiovascular diseases are diseases related to heart and circulatory system. Heart disease is explored in this paper. More »»

2016

Conference Paper

Dr. Deepa Gupta, Khare, S., and Aggarwal, A., “A method to predict diagnostic codes for chronic diseases using machine learning techniques”, in 2016 International Conference on Computing, Communication and Automation (ICCCA), 2016.[Abstract]


Healthcare in simplest form is all about diagnosis and prevention of disease or treatment of any injury by a medical practitioner. It plays an important role in providing quality life for the society. The concern is how to provide better service with less expensive therapeutically equivalent alternatives. Machine Learning techniques (ML) help in achieving this goal. Healthcare has various categories of data like clinical data, claims data, drugs data and hospital data. This paper focuses on clinical and claims data for studying 11 chronic diseases such as kidney disease, osteoporosis, arthritis etc. using the claims data. The correlation between the chronic diseases and the corresponding diagnostic tests is analyzed, by using ML techniques. An effective conclusion on various diagnostics for each chronic disease is made, keeping in mind the clinical relevance. More »»

2015

Conference Paper

Ka Vani, Dr. Deepa Gupta, Krishnaswamy D, Thampi S.M, Callegari C, Alcaraz Calero J.M, Takagi H, Mauri J.L, Meghanathan, N., Rodrigues, J., Bojkovic Z.S, ,, Wozniak, M., Sahni, S., Vinod M., Prasad, N. R., Que X., and Au E., “Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system”, in 2015 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2015, 2015, pp. 1578-1584.[Abstract]


Plagiarism is an illicit act which has become a prime concern mainly in educational and research domains. This deceitful act is usually referred as an intellectual theft which has swiftly increased with the rapid technological developments and information accessibility. Thus the need for a system/ mechanism for efficient plagiarism detection is at its urgency. In this paper, an investigation of different combined similarity metrics for extrinsic plagiarism detection is done and it focuses on unfolding the importance of combined similarity metrics over the commonly used single metric usage in plagiarism detection task. Further the impact of utilizing part of speech tagging (POS) in the plagiarism detection model is analyzed. Different combinations of the four single metrics, Cosine similarity, Dice coefficient, Match coefficient and Fuzzy-Semantic measure is used with and without POS tag information. These systems are evaluated using PAN1 -2014 training and test data set and results are analyzed and compared using standard PAN measures, viz, recall, precision, granularity and plagdet-score. © 2015 IEEE.

More »»

2015

Conference Paper

Manju Venugopalan and Dr. Deepa Gupta, “Exploring sentiment analysis on Twitter data”, in Eighth International Conference on Contemporary Computing (IC3), 2015.[Abstract]


The growing popularity of microblogging websites has transformed these into rich resources for sentiment mining. Even though opinion mining has more than a decade of research to boost about, it is mostly confined to the exploration of formal text patterns like online reviews, news articles etc. Exploration of the challenges offered by informal and crisp microblogging have taken roots but there is scope for a large way ahead. The proposed work aims at developing a hybrid model for sentiment classification that explores the tweet specific features and uses domain independent and domain specific lexicons to offer a domain oriented approach and hence analyze and extract the consumer sentiment towards popular smart phone brands over the past few years. The experiments have proved that the results improve by around 2 points on an average over the unigram baseline.

More »»

2015

Conference Paper

Manju Venugopalan and Dr. Deepa Gupta, “Sentiment Classification for Hindi Tweets in a Constrained Environment Augmented Using Tweet Specific Features”, in Mining Intelligence and Knowledge Exploration, 2015, pp. 664–670.[Abstract]


India being a diverse country rich in spoken languages with around 23 official languages has always left open a wide arena for NLP researchers. The increase in the availability of voluminous data in Indian languages in the recent years has prompted researchers to explore the challenges in the Indian language domain. The proposed work explores Sentiment Analysis on Hindi tweets in a constrained environment and hence proposes a model for dealing with the challenges in extracting sentiment from Hindi tweets. The model has exhibited an average performance with cross validation accuracy for training data around 56 % and a test accuracy of 43 %.

More »»

2015

Conference Paper

Va Dominic, Dr. Deepa Gupta, Sangita Khare, and Aggarwal, Ab, “Investigation of chronic disease correlation using data mining techniques”, in 2015 2nd International Conference on Recent Advances in Engineering and Computational Sciences, RAECS 2015, 2015.[Abstract]


A disease is an abnormal condition that affects the structure and function of one or more parts of the body. It may be caused by various factors, external and internal dysfunctions. There is a trend of various chronic diseases in any society. The major concern is that these chronic diseases are leading to many other diseases in future. An attempt to explore the correlation of various chronic diseases has become a necessity. This can be achieved by using data mining techniques, which help to derive knowledge about the affects of a particular chronic disease on the other chronic diseases. Since there is growing trend of diabetes and ischemic heart disease in the society, in this paper the focus is to investigate the effect of these diseases on the other chronic diseases using the ICD9 diagnostic codes. To achieve this goal various types of data mining techniques are used. The conclusion is an optimal set of ICD9 diagnostic codes associated with individuals having diabetes or ischemic heart disease. These codes are then investigated based on the human anatomic systems i.e. Circulatory system, Respiratory system, Nervous system, Musculoskeletal system, Renal system and Neoplasm and their relevance is justified. © 2015 IEEE.

More »»

2015

Conference Paper

P. Vivek, G. Radhakrishnan, Dr. Deepa Gupta, and Dr. T.S.B. Sudarshan, “Clustering of robotic environment using image data stream”, in International Conference Communication, Control and Intelligent Systems, CCIS 2015, 2015, pp. 208-213.[Abstract]


Mobile robots are being used in various applications like space shuttles, intelligent home security, military applications or other service oriented applications where human intervention is limited. A robot has to understand its environment by analyzing the data to take the appropriate actions in the given environment. Mostly the data collected from the sensors on the robots are huge and continuous, making it impossible to store the entire data in main memory and hence allowing only single scan of data. Traditional clustering algorithms like k-means cannot be used in such environment as they require multiple scan of data. This paper presents an experimental study on the implementation of Stream KM++, a data stream clustering algorithm that effectively cluster these time series robotic image data within the memory restrictions under various conditions. Promising results are obtained from the various experiments carried out. More »»

2015

Conference Paper

S. Sanagar and Dr. Deepa Gupta, “Adaptation of multi-domain corpus learned seeds and polarity lexicon for sentiment analysis”, in 2015 International Conference on Computing and Network Communications, CoCoNet 2015, 2015, pp. 50-58.[Abstract]


Sentiment analysis has emerged as an independent branch of research and attracted many researchers in recent years. Analysis of sentiment deals with expressed opinions. That makes it widely applicable in every part of life and in businesses where opinion counts. Opinions are expressed by the means of opinion oriented words which are part of sentiment analysis resource such as polarity lexicon. Polarity lexicon construction is widely explored by researchers using various supervised and semi-supervised approaches. Semi-supervised approaches are often combined with polarity seed information. A novel semi-supervised approach is proposed to construct polarity lexicon using iterative Latent Semantic Analysis technique from unlabeled multiple source domains corpus. This polarity lexicon is adaptable across multiple target domains. In the process seed words are learned from multiple domain corpus and subsequently adapted to new domains. Significant improvement in accuracy is observed over the baselines. © 2015 IEEE.

More »»

2015

Conference Paper

G. Radhakrishnan, Dr. Deepa Gupta, Sindhuula, S., Khokhawat, S., and Dr. T.S.B. Sudarshan, “Experimentation and analysis of time series data from multi-path robotic environment”, in 2015 IEEE International Conference on Electronics, Computing and Communication Technologies, CONECCT 2015, 2015.[Abstract]


Autonomous mobile robots are increasingly used in many application areas. In most applications, they have to explore and gather knowledge about the environment they are deployed in. These robots transfer real time data about the environment continuously. This paper discusses a set of experiments that have been carried out to simulate various robotic environments. A robot attached with four sensors is used to collect information about the environment as the robot moves in multiple straight line paths. Time series data collected from these experiments are clustered using data mining techniques. Experimental results show clustering accuracies vary depending on the number of clusters formed. © 2015 IEEE.

More »»

2014

Conference Paper

Dr. Deepa Gupta and Verma, R., “An enhanced cluster-head selection scheme for distributed heterogeneous wireless sensor network”, in Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on, 2014.[Abstract]


Cluster Head selection method is a critical and energy constraint process in wireless sensor network. This process required significant amount of energy affecting the performance and operation of wireless sensor network. The advantageous heterogeneous wireless sensor network provides different type of data from different variety of sensors in same network but because of complex network operations it shows poor performance. For enhanced performance of wireless sensor network, improvements are needed at some critical parameters such as energy potency, network lifetime, node readying, fault lenience and dormancy. The proposed Cluster-Head selection scheme deals with two level heterogeneous wireless sensor networks. Improved Cluster-Head selection process ends up in less energy consumption that prolongs the network lifetime and stability.

More »»

2014

Conference Paper

R. Gopalapillai, Dr. Deepa Gupta, and Sudarshan, T. S. B., “Experimentation and analysis of time series data for rescue robotics”, in Second International Symposium on Intelligent Informatics (IST'13), 2014, pp. 443–453.[Abstract]


In today’s world, rescue robots are used in various life threatening situations where human help or support is not possible. These robots transfer real time data about the environment continuously. Research is focussed on techniques to analyse real time data to enable Decision Support Systems (DSS) to take timely actions to save lives. This paper discusses preliminary experiments that have been carried out to simulate a set of simple robotic environments. A robot attached with four sensors is used to collect information about the environments as the robot moves in a straight line path. Time series data collected from these experiments are clustered using data mining techniques. Experimental results show recall and precision between 73% to 98%.

More »»

2014

Conference Paper

K. S. Vishnu, Apoorva, T., and Dr. Deepa Gupta, “Learning Domain-Specific and Domain-independent Opinion Oriented Lexicons using Multi Domain Knowledge”, in Sixth International Conference on Contemporary Computing (IC3-2014), 2014.[Abstract]


Sentiment analysis systems are used to know the opinions of customer reviews. The basic resource for the sentiment analysis systems are polarity lexicon. Each term in polarity lexicon indicates its affinity towards positive or negative opinion. However, this affinity of word changes with the change in domain. In this work, we explore a polarity lexicon using SentiWordNet, a domain independent lexicon to adapt specific domain and update the domain independent lexicon based on multiple domain knowledge. The proposed approach has been tested on five domains: Health, Books, Camera, Music and DVD. The improvement in accuracy ranges from 4.5 to 19 pointsacross all the domains over baseline.

More »»

2014

Conference Paper

S. Swetha, Dr. Deepa Gupta, Radhakrishnan, G., and Sudarshan, T. S. B., “Analysis of Robotic Environment using Low Resolution Image Sequence”, in International Conference on Contemporary Computing and Informatics, Mysore, 2014.[Abstract]


Mobile robots have been widely used in agricultural, industrial and military applications and areas where human intervention is not possible. These robots are equipped with sensors and image capturing devices to collect time series data from the environment. The data thus collected are analyzed to obtain vital information about the environment. This paper presents an experimental study using low resolution image data captured from a set of complex robotic scenarios. Features such as color and texture are extracted from the images. The scenarios are then clustered based of the extracted features using k-medoids algorithm. Clustering accuracy has been analyzed with different image resolutions and feature extraction methods

More »»

2014

Conference Paper

V. K and Dr. Deepa Gupta, “Using K-means Cluster based Techniques in External Plagiarism Detection”, in International Conference on Contemporary Computing and Informatics, Mysore, 2014.[Abstract]


Text document categorization is one of the rapidly emerging research fields, where documents are identified, differentiated and classified manually or algorithmically. The paper focuses on application of automatic text document categorization in plagiarism detection domain. In today's world plagiarism has become a prime concern, especially in research and educational fields. This paper aims on the study and comparison of different methods of document categorization in external plagiarism detection. Here the primary focus is to explore the unsupervised document categorization/ clustering methods using different variations of K-means algorithm and compare it with the general N-gram based method and Vector Space Model based method. Finally the analysis and evaluation is done using data set from PAN-20131 and performance is compared based on precision, recall and efficiency in terms of time taken for algorithm execution.

More »»

2014

Conference Paper

G. Radhakrishnan, Sudarshan, T. S. B., Murali, M., and Dr. Deepa Gupta, “Clustering of Robotic Environments Using Image Sequence Data”, in Eighth International Conference on Data Mining and Warehousing (ICDMW-2014), 2014.

2014

Conference Paper

G. Radhakrishnan, Sudarshan, T. S. B., Mishra, S., and Dr. Deepa Gupta, “Acquistion and Analysis of Robotic Data using Machine learning Techniques”, in International Conference on Computational Intelligence in Data Mining, 2014.

2013

Conference Paper

C. .Priyanka and Dr. Deepa Gupta, “Identifying the Best Feature Combination for Sentiment Analysis of Customer Reviews”, in Second International Conference on Advances in Computing, Communications and Informatics. (ICACCI-2013), 2013.[Abstract]


Opinions are increasingly available in form of reviews and feedback at websites, blogs, and microblogs which influence future customers. From human perspective, it is difficult to read all the opinions and summarize them which require an automated and faster opinion mining to classify the reviews. In this paper different features namely, N-gram features, POS based features and features based on the lexicon SentiWordNet, have been investigated. The Support Vector Machines (SVM) classifier has been modeled with presence as feature representation for classification of the reviews into positive and negative classes thereby identifying the best feature combination. Results of Experiments conducted on smart phone reviews for different feature combinations have been presented. A highest accuracy up till 92% and 95% has been obtained for small and large datasets, respectively.

More »»

2013

Conference Paper

R. Gopalapillai, Vidhya, J., Dr. Deepa Gupta, and Sudarshan, T. S. B., “Classification of Robotic Data using Artificial Neural Network”, in 2nd IEEE International Conference on Recent Advances in Intelligent Computational Systems (RAICS 2013), 2013.[Abstract]


As time series data are common in the field of science and commerce, time series data analysis has an important role in these areas for extracting information from available data. This paper presents the application of Artificial Neural Networks (ANN) for analyzing huge amount of time series data collected by sensors mounted on a robot navigating in a simulated environment. The Artificial Neural Network system employing back propagation learning algorithm classified different scenarios encountered by the robot using the data collected by sensors.

More »»

2012

Conference Paper

G. Radhakrishnan, Dr. Deepa Gupta, Abhishek, R., Ajith, A., and Dr. T.S.B. Sudarshan, “Analysis of multimodal time series data of robotic environment”, in International Conference on Intelligent Systems Design and Applications, ISDA, Kochi, 2012, pp. 734-739.[Abstract]


Autonomous mobile robots equipped with an array of sensors are being increasingly deployed in disaster environments to assist rescue teams. The sensors attached to the robots send multimodal time series data about the disaster environments which can be analyzed to extract useful information about the environment in which the robots are deployed. A set of data mining tasks that effectively cluster various robotic environments have been investigated. The effectiveness of these data mining techniques have been demonstrated using an available robotic dataset. The accuracy of the proposed technique has been measured using a manual reference cluster set. © 2012 IEEE.

More »»

2010

Conference Paper

R.K. Yadav and Dr. Deepa Gupta, “Annotation guidelines for Hindi-English word alignment”, in Proceedings - 2010 International Conference on Asian Language Processing, IALP 2010, Harbin, 2010, pp. 293-296.[Abstract]


A duo such as Hindi-English (Hin-Eng) does differ in terms of grammar, and thus finding correspondences is often quite obscure in word alignment. Hindi being rich in morphology makes the alignment with its counterpart a bit contingent and invites obscurities in annotation process. We present annotation guidelines for Hin-Eng word alignment through contrastive analysis of the two languages. We applied existing guidelines for Dutch-English [1], coupled with Blinker project guidelines [2] and augmented them to cover frequently occurring cases in our corpus. We discuss verbal system which causes most linking obscurities by analyzing verb morphology which allows us to define consistent and systematic instructions for manual word alignment. © 2010 IEEE.

More »»

2010

Conference Paper

E. Venkataramani and Dr. Deepa Gupta, “English-hindi automatic word alignment with scarce resources”, in Asian Language Processing (IALP), 2010 International Conference on, 2010, pp. 253-256.[Abstract]


Many automatic word alignment techniques have been so far developed in Natural Language Processing (NLP). However, word alignment between English and Hindi has not progressed much due to two main reasons viz. complex structure of the participating languages and the scarcity of Hindi-language resources. This paper provides a corpus-augmented method of word alignment in which these limitations have been overcome. We see this work as an improved approach in establishing a word alignment algorithm with scarce resources for Indian languages in general and for English-Hindi in particular. More »»

2007

Conference Paper

N. Bertoldi, Brugnara F, Cattoni, R., Cettolo, M., Chen, B., Federico M, Giuliani D., Gretter, R., Dr. Deepa Gupta, Seppi. D., and , “The IRST English-Spanish Translation System for European Parliament Speeches”, in Interspeech, 2007.[Abstract]


This paper presents the spoken language translation system developed at FBK-irst during the TC-STAR project. The system integrates automatic speech recognition with machine translation through the use of confusion networks, which permit to represent a huge number of transcription hypotheses generated by the speech recognizer. Confusion networks are efficiently decoded by a statistical machine translation system which computes the most probable translation in the target language. This paper presents the whole architecture developed for the translation of political speeches held at the European Parliament, from English to Spanish and vice ver More »»

2006

Conference Paper

M. Popović, Ney, H., De Gispert, A., Mariño, J. B., Dr. Deepa Gupta, Federico, M., Lambert, P., and Banchs, R., “Morpho-syntactic information for automatic error analysis of statistical machine translation output”, in Proceedings of the workshop on statistical machine translation, 2006.

2006

Conference Paper

M. Federico and Dr. Deepa Gupta, “Exploiting Word Transformation in Statistical Machine Translation from Spanish to English”, in 11th Annual conference of the European Association for Machine Translation (EAMT), 2006, pp. 75–80.

2004

Conference Paper

S. Goyal, Chatterjee, N., and Dr. Deepa Gupta, “A Study of Hindi Translation Patterns for English Sentences with "Have" as the Main Verb”, in the International Symposium on MT, NLP and Translation Support Systems: iSTRANS-2004, 2004.

2003

Conference Paper

N. Chatterjee and Dr. Deepa Gupta, “A Morpho-Syntax based Adaptation and Retrieval Scheme for English to Hindi EBMT”, in Workshop on Computational Linguistic for the Languages of South Asia: Expanding Synergies with Europe, 2003, p. 23.[Abstract]


This paper focuses on Example Based Machine Translation (EBMT) between English and Hindi, the most popular language in South Asia. Given an input sentence, an EBMT system retrieves similar sentence (s) from its example base and adapts their translation (s) suitably to generate the translation of the given input. This paper proposes a systematic adaptation scheme that takes into account the morphology and syntax of the input and the retrieved source language sentences. More »»

2002

Conference Paper

N. Chatterjee and Dr. Deepa Gupta, “Study of Similarity and its Measurement for English to Hindi EBMT”, in STRANS-2002, 2002.

2002

Conference Paper

N. Chatterjee and Dr. Deepa Gupta, “A Systematic Adaptation Scheme for English-Hindi Example-Based Machine Translation”, in STRANS-2002, 2002.

Publication Type: Journal Article

Year of Publication Publication Type Title

2017

Journal Article

V. V and Dr. Deepa Gupta, “Detection of idea plagiarism using syntax–Semantic concept extractions with genetic algorithm”, Expert Systems with Applications, vol. 73, pp. 11-26, 2017.[Abstract]


Plagiarism is increasingly becoming a major issue in the academic and educational domains. Automated and effective plagiarism detection systems are direly required to curtail this information breach, especially in tackling idea plagiarism. The proposed approach is aimed to detect such plagiarism cases, where the idea of a third party is adopted and presented intelligently so that at the surface level, plagiarism cannot be unmasked. The reported work aims to explore syntax-semantic concept extractions with genetic algorithm in detecting cases of idea plagiarism. The work mainly focuses on idea plagiarism where the source ideas are plagiarized and represented in a summarized form. Plagiarism detection is employed at both the document and passage levels by exploiting the document concepts at various structural levels. Initially, the idea embedded within the given source document is captured using sentence level concept extraction with genetic algorithm. Document level detection is facilitated with word-level concepts where syntactic information is extracted and the non-plagiarized documents are pruned. A combined similarity metric that utilizes the semantic level concept extraction is then employed for passage level detection. The proposed approach is tested on PAN13-141 http://pan.webis.de/. plagiarism corpus for summary obfuscation data, which represents a challenging case of idea plagiarism. The performance of the current approach and its variations are evaluated both at the document and passage levels, using information retrieval and PAN plagiarism measures respectively. The results are also compared against six top ranked plagiarism detection systems submitted as a part of PAN13-14 competition. The results obtained are found to exhibit significant improvement over the compared systems and hence reflects the potency of the proposed syntax-semantic based concept extractions in detecting idea plagiarism. © 2016 Elsevier Ltd

More »»

2016

Journal Article

NaRiya Ravi and Dr. Deepa Gupta, “A plagiarized source retrieval system developed using efficient download filtering and POS tagged query formulation with effective paragraph based chunking”, International Journal of Artificial Intelligence, vol. 14, pp. 145-160, 2016.[Abstract]


Source Retrieval is an important task of External Plagiarism Detection system which involves in identifying a set of candidate source documents for a given suspicious document. Not to lose any actual source document while reducing the size of the candidate source document set is crucial. This paper describes the approach of Source Retrieval task of External Plagiarism Detection System. The approach includes chunking of documents based on paragraphs along with Part-of- Speech tagging and an efficient download filtering method. The proposed system is evaluated against PAN 2011-12, PAN 2012-13 PAN 2014-15 Test Data Set and results are analysed and compared using standard PAN measures: Recall, Precision, F Measure, average number of queries and downloads. The proposed approach exhibited improved efficiency in PAN 2015 conducted by PAN CLEF Evaluation lab1, by acquiring highest values for F Measure and Precision along with least Downloads. The results are further improved by incorporating efficient query and download filtering mechanisms over the proposed system. The effect of the enhanced proposed system is also discussed and analysed in this paper. © 2016 CESER PUBLICATIONS.

More »»

2016

Journal Article

Sa Vijayakumar, Dr. Deepika M. G., Dr. Amalendu Jyotishi, and Dr. Deepa Gupta, “Factors affecting infant mortality rate in India: An analysis of Indian states”, Advances in Intelligent Systems and Computing, vol. 530, pp. 707-719, 2016.[Abstract]


While there are enough efforts by the governments to reduce the infant mortality rate in developing countries, the results are not as desired. India is no exception to the case. Identifying the factors that affect the infant mortality rates would help in better targeting of the programs leading to enhanced efficiency of such programs. Earlier studies have shown the influence of socio economic factors on infant mortality rates at a global level and found that variables like fertility rate, national income, women in labour force, expenditure on health care and female literacy rates influence the infant mortality rates. The current study using the data from Indiastat.com from all states and Union Territories of India for the years 2001 and 2011 tries to establish the relationship between infant mortality rate and some of the above mentioned factors along with a few healthcare infrastructure related variables. Using a regression analysis method we not only identify the influence of the variables on infant mortality, we went a step further in identifying the performance of states and union territories in reducing IMR. The performance was measured using 'technical efficiency' analysis. We then compared the performance and growth rate of IMR to classify the states as good performers and laggards. Our results suggest that most of the major states are on track on their performance on IMR. However, a few small states and union territories like Andaman and Nicobar Island, Mizoram, Arunachal Pradesh as well as Jammu & Kashmir need special attention and targeting to reduce IMR. © Springer International Publishing AG 2016.

More »»

2016

Journal Article

K. Jaya and Dr. Deepa Gupta, “Exploration of Corpus Augmentation Approach for English-Hindi Bidirectional Statistical Machine Translation System”, International Journal of Electrical and Computer Engineering, vol. 6, p. 1059, 2016.[Abstract]


Even though lot of Statistical Machine Translation (SMT) research work is happening for English-Hindi language pair, there is no effort done to standardize the dataset. Each of the research work uses different number of sentences, datasets and parameters during various phases of translation resulting in varied translation output. So comparing these models, understand the result of these models, to get insight into corpus behavior for these models, regenerating the result of these research work becomes tedious. This necessitates the need for standardization of dataset and to identify the common parameter for the development of model. The main contribution of this paper is to introduce an approach to standardize the dataset and to identify the best parameter which in combination gives best performance. It also investigates a novel corpus augmentation approach to improve the translation quality of English-Hindi bidirectional statistical machine translation system. This model works well for the scarce resource without incorporating the external parallel data corpus of the underlying language. This experiment is carried out using Open Source phrase-based toolkit Moses. Indian Languages Corpora Initiative (ILCI) Hindi-English tourism corpus is used. With limited dataset, considerable improvement is achieved using the corpus augmentation approach for the English-Hindi bidirectional SMT system. More »»

2015

Journal Article

Dr. Deepa Gupta, Aswathi, T., and Yadav, RcKumar, “Investigating Bidirectional Divergence In Lexical-Semantic Class For English-Hindi-Dravidian Translations”, International Journal of Applied Engineering Research, vol. 10, pp. 8851-8884, 2015.[Abstract]


The fail factor of a Machine Translation system is largely governed by the degree of cross linguistic variations called divergence, which are encountered during translations. Divergence is a language dependent phenomenon. More the divergence more is the skewness in the results delivered by the MT system for the considered language pair. All work reported thus far study divergence in a unidirectional manner i.e. from Source Language (SL) to Target Language (TL). Through this paper, we aim to present a detailed exploration of bidirectional divergence found in English-Hindi-Dravidian language triplet, which has never been explored thus far. Moreover, if transparency in cross linguistic variations should be obtained, then two way divergences should not be overlooked. Toward this end, we propose the concept of balance and imbalance observed in to and fro divergence. Balanced being the case when the two way divergence is preserved on mirror while imbalance, when the same is violated. The Dravidian languages under consideration are Malayalam, Tamil, Telugu and Kannada. The divergence analysis has been dealt for the pairing of each of these languages with English based on children stories, translation books, advertisement material and official articles and documents. © Research India Publications.

More »»

2015

Journal Article

Ca Priyanka and Dr. Deepa Gupta, “Fine grained sentiment classification of customer reviews using computational intelligent technique”, International Journal of Engineering and Technology, vol. 7, pp. 1453-1468, 2015.[Abstract]


Online reviews are now popularly used for judging quality of product or service and influence decision making of users while selecting a product or service. Due to innumerous number of customer reviews on the web, it is difficult to summarize them which require a faster opinion mining system to classify the reviews. Many researchers have explored various supervised and unsupervised machine learning techniques for binary classification of reviews. Compared to these techniques, fuzzy logic can provide a straightforward and comparatively faster way to model the fuzziness existing between the sentiment polarities classes due to the ambiguity present in most of the natural languages. But the fuzzy logic techniques are less explored in this domain. Hence in this paper, a fuzzy logic model based on the most popularly known sentiment based lexicon SentiWordNet has been proposed for fine grained classification of the reviews into weak positive, moderate positive, strong positive, weak negative, moderate negative and strong negative classes. Experiments have been conducted on datasets containing reviews of electronic products namely smart phones, LED TV and laptops and have shown to provide fine grained classification accuracy approximately in the range of 74% to 77%.

More »»

2015

Journal Article

V. Dominic, Dr. Deepa Gupta, and Khare, S., “An Effective Performance Analysis of Machine Learning Techniques for Cardiovascular Disease”, Applied Medical Informatics, vol. 36, p. 23, 2015.[Abstract]


Abstract Machine learning techniques will help in deriving hidden knowledge from clinical data which can be of great benefit for society, such as reduce the number of clinical trials required tor precise diagnosis of a disease of a person etc. Various areas of study are available in healthcare domain like cancer, diabetes, drugs etc. This paper focuses on heart disease dataset and how machine learning techniques can help in understanding the level of risk associated with heart diseases. Initially, data is preprocessed then analysis is done

More »»

2015

Journal Article

D. Vinitha, Dr. Deepa Gupta, and Khare, S., “Exploration of Machine Learning Techniques for Cardiovascular Disease”, Applied Medical Informatics, vol. 36, pp. 23–32, 2015.[Abstract]


Machine learning techniques will help in deriving hidden knowledge from clinical data which can be of great benefit for society, such as reduce the number of clinical trials required for precise diagnosis of a disease of a person etc. Various areas of study are available in healthcare domain like cancer, diabetes, drugs etc. This paper focuses on heart disease dataset and how machine learning techniques can help in understanding the level of risk associated with heart diseases. Initially, data is preprocessed then analysis is done in two stages, in first stage feature selection techniques are applied on 13 commonly used attributes and in second stage feature selection techniques are
applied on 75 attributes which are related to anatomic structure of the heart like blood vessels of the heart, arteries etc. Finally, validation of the reduced set of features using an exhaustive list of classifiers is done.In parallel study of the anatomy of the heart is done using the identified features and the characteristics of each class is understood. It is observed that these reduced set of features are anatomically relevant. Thus, it can be concluded that, applying machine learning techniques on clinical data is beneficial and necessary.

More »»

2015

Journal Article

Manju Venugopalan and Dr. Deepa Gupta, “An Enhanced Polarity Lexicon by Learning-based Method Using Related Domain Knowledge”, International Journal of Information Processing and Management, vol. 6, no. 2, pp. 61–72, 2015.[Abstract]


The inborn human instinct to know what others think has contributed to the growing popularity of Sentiment Analysis. Sentiment in a text is mostly derived from opinion oriented words or lexicons. But the challenge is put forward by opinion oriented words which are domain specific. Various researchers have proposed methods to improve the polarity scores of these domain specific lexicons. Existing works utilize mainly single domain knowledge which is not sufficient to update a domain-specific lexicon. The proposed work attempts a domain adaptation model by building a polarity lexicon using knowledge from multiple related domains. The polarity lexicon thus built when tested on new domains provides fairly good classification results thus implementing true domain adaptation. The proposed approach has been tested on Amazon product reviews from ten related domains which include Printer, Scanner, MP3 Player, iPod, LCD TV etc. A significant improvement in accuracy ranging from 1 to 14.5 points on learned domains and 0.5 to 8 points across new domains over the baseline has been observed.

More »»

2015

Journal Article

R. N Ravi and Dr. Deepa Gupta, “Efficient Paragraph based Chunking and Download Filtering for Plagiarism Source Retrieval”, 2015.[Abstract]


This paper describes the approach of the system that we built as part of the participation in ‘PAN 2015 Source Retrieval’ task. Chunking of documents based on paragraphs and efficient download filtering improved the overall performance of the system. Source Retrieval is an important task of a Plagiarism Detection system

More »»

2015

Journal Article

T. Ramesh, Nanjangud, N., Kothapalli, K., Dr. Deepa Gupta, Chaudhary, F. Sanjay, Univetsity, A., and Sushil K Prasad, “2015 Eighth International Conference on Contemporary Computing (IC3)”, 2015.[Abstract]


We welcome you to the 20 15 Eight International Conference on Contemporary Computing (IC3). This is the eighth conference in the series, held annually at the Jaypee Institute of Information Technology (nIT), and organized jointly by the hosts and the University of Florida, Gainesville, USA. The conference focuses on issues of contemporary interest in
computing, spanning systems, algorithms and applications. We do hope you will find the conference and these proceedings exciting and rewarding

More »»

2015

Journal Article

T. Aswathi and Dr. Deepa Gupta, “Unsupervised Shallow Morph-Analyzer for Malayalam Using Recursive Clustering Based Approach”, International Journal of Applied Engineering Research, vol. 9, no. 23, pp. 21197–21215, 2015.

2014

Journal Article

Dr. Deepa Gupta, .Vani, k, and Singh.Charan Kamal, “Using Natural Language Processing Techniques and Fuzzy-Semantic Similarity for Automatic External Plagiarism Detection”, Third International Conference on Advances in Computing, Communications and Informatics (ICACCI-2014), pp. 2694–2699, 2014.[Abstract]


Plagiarism is one of the most serious crimes in academia and research fields. In this modern era, where access to information has become much easier, the act of plagiarism is rapidly increasing. This paper aligns on external plagiarism detection method, where the source collection of documents is available against which the suspicious documents are compared. Primary focus is to detect intelligent plagiarism cases where semantics and linguistic variations play an important role. The paper explores the different preprocessing methods based on Natural Language Processing (NLP) techniques. It further explores fuzzy-semantic similarity measures for document comparisons. The system is finally evaluated using PAN 20121 data set and performances of different methods are compared.

More »»

2014

Journal Article

Dr. Deepa Gupta and Aswathi, T., “Unsupervised Shallow Morph-Analyzer for Malayalam Using Recursive Based Approach.”, International Journal of Applied Engineering Research, vol. 9, 2014.

2013

Journal Article

Dr. Deepa Gupta and Nair, LbMadhu, “Improving OCR by effective pre-processing and segmentation for Devanagiri script:A quantified study”, Journal of Theoretical and Applied Information Technology, vol. 52, pp. 142-153, 2013.[Abstract]


<p>Optical Character Recognition (OCR) system aims to convert optically scanned text image to a machine editable text form. Multiple approaches to preprocessing and segmentation exist for various scripts. However, only a restricted combination of the same has been experimented on Devanagari script. This paper proposes a study which aims to explore and bring out an alternative and efficient strategy of preprocessing and segmentation in handling OCR for Devanagari scripts. Efficiency evaluation of the proposed alternative has been undertaken by subjecting it to documents with varying degree of noise severity and border artifacts. The experimental results confirm our proposition to be superior approach over other conventional methodologies to OCR system implementation for Devanagari scripts. Also described is detailed approach to conventional pre-processing involved in initial stage of OCR, including noise removal techniques, along with the other conventional approaches to segmentation. The proposed alternative has been deployed to reach character and top character segmentation level. © 2005 - 2013 JATIT &amp; LLS. All rights reserved.</p>

More »»

2013

Journal Article

C. A. O. Yonghui, Liyun, X., Zhiyi, F., Hongyu, S., ZHONG, S. H. A. N. G. Q. I. N., Dr. Deepa Gupta, NAIR, L. E. E. M. A. M. A. D. H. U., Duraisamy, G., Atan, R., Naeimizaghiani, M., and , “Study of SPI framework for CMMI continuous model based on QFD”, Journal of Theoretical and Applied Information Technology, vol. 52, 2013.[Abstract]


Optical Character Recognition (OCR) system aims to convert optically scanned text image to a machine editable text form. Multiple approaches to preprocessing and segmentation exist for various scripts. However, only a restricted combination of the same has been experimented on Devanagari script. This paper proposes a study which aims to explore and bring out an alternative and efficient strategy of pre-processing and segmentation in handling OCR for Devanagari scripts. Efficiency evaluation of the proposed alternative has been undertaken by subjecting it to documents with varying degree of noise severity and border artifacts. The experimental results confirm our proposition to be superior approach over other conventional methodologies to OCR system implementation for Devanagari scripts. Also described is detailed approach to conventional pre-processing involved in initial stage of OCR, including noise removal techniques, along with the other conventional approaches to segmentation. The proposed alternative has been deployed to reach character and top character segmentation level. More »»

2012

Journal Article

Dr. Deepa Gupta, Yadav, R. Kumar, and Sajan, N., “Improving unsupervised stemming by using partial lemmatization coupled with data-based heuristics for Hindi”, International Journal of Computer Applications, vol. 38, pp. 1–8, 2012.

2009

Journal Article

Dr. Deepa Gupta, “Will Sentences Have Divergence Upon Translation?: A Corpus-Evidence Based Solution for Example Based Approach”, Language in India, vol. 9, no. 10, pp. 1930–2940, 2009.

2007

Journal Article

Dr. Deepa Gupta, Cettolo, M., and Federico, M., “POS-based reordering models for statistical machine translation”, Proceedings of the Machine Translation Summit (MT-Summit), 2007.[Abstract]


We present a novel word reordering model for phrase-based statistical machine translation suited to cope with long-span word movements.In particular, reordering of nouns, verbs and adjectives is modeled by taking into account target-to-source word alignments and the distances between source as well as target words. The proposed model was applied as a set of additional feature functions to re-score N-best translation candidates generated by a statistical machine translation system featuring state-of-the-art lexicalized reordering models. Experiments showed relative BLEU score improvement up to 7.3% on the BTEC Japanese-to-English task, and up to 1.1% on the Europarl German-to-English task More »»

2005

Journal Article

Dr. Deepa Gupta, “Contributions to english to hindi machine translation using example-based approach”, Unpublished Ph. D. Thesis submitted at Indian Institute of Technology, Delhi, 2005.[Abstract]


This research focuses on development of Example Based Machine Translation (EBMT) system for English to Hindi. Development of a machine translation (MT) system typically demands a large volume of computational resources. For example, rulebased MT systems require extraction of syntactic and semantic knowledge in the form of rules, statistics-based MT systems require huge parallel corpus containing sentences in the source languages and their translations in target language. Requirement of such computational resources is much less in respect of EMBT. This makes development of EBMT systems for English to Hindi translation feasible, where availability of large-scale computational resources is still scarce. The primary motivation for this work comes because of the following More »»

2003

Journal Article

Dr. Deepa Gupta and Chatterjee, N., “Identification of divergence for English to Hindi EBMT”, Proceeding of MT Summit-IX, pp. 141–148, 2003.[Abstract]


Divergence is a key aspect of translation between two languages. Divergence occurs when structurally similar sentences of the source language do not translate into sentences that are similar in structures in the target language. Divergence assumes special significance in the domain of Example-Based Machine Translation (EBMT). An EBMT system generates translation of a given sentence by retrieving similar past translation examples from its example base and then adapting them suitably to meet the current translation requirements. Divergence imposes a great challenge to the success of EBMT. The present work provides a technique for identification of divergence without going into the semantic details of the underlying sentences. This identification helps in partitioning the example database into divergence / non-divergence categories, which in turn should facilitate efficient retrieval and adaptation in an EBMT system More »»

2003

Journal Article

N. Chatterjee and Dr. Deepa Gupta, “Divergence in English to Hindi Translation: Some Studies”, International Journal of Translation, vol. 15, pp. 5–24, 2003.

2003

Journal Article

Dr. Deepa Gupta and Chatterjee, N., “A Morpho-Syntax Based Adaptation and Retrieval Scheme for English to Hindi Example Based Machine Translation”, EACL 2003, 2003.[Abstract]


This paper focuses on Example Based Machine Translation (EBMT) between English and Hindi, the most popular language in South Asia. Given an input sentence, an EBMT system retrieves similar sentence (s) from its example base and adapts their translation (s) suitably to generate the translation of the given input. This paper proposes a systematic adaptation scheme that takes into account the morphology and syntax of the input and the retrieved source language sentences. More »»

2001

Journal Article

Dr. Deepa Gupta and Chatterjee, N., “Study of divergence for example based English-Hindi machine translation”, STRANS-2001, IIT Kanpur, pp. 43–51, 2001.

Publication Type: Book Chapter

Year of Publication Publication Type Title

2016

Book Chapter

R. N Ravi, Vani, K., and Dr. Deepa Gupta, “Exploration of Fuzzy C Means Clustering Algorithm in External Plagiarism Detection System”, in Intelligent Systems Technologies and Applications, vol. 1, Springer International Publishing, 2016, pp. 127–138.[Abstract]


With the advent of World Wide Web, plagiarism has become a prime issue in field of academia. A plagiarized document may contain content from a number of sources available on the web and it is beyond any individual to detect such plagiarism manually. This paper focuses on the exploration of soft clustering, via, Fuzzy C Means algorithm in the candidate retrieval stage of external plagiarism detection task. Partial data sets from PAN 2013 corpus is used for the evaluation of the system and the results are compared with existing approaches, via, N-gram and K Means Clustering. The performance of the systems is measured using the standard measures, precision and recall and comparison is done.

More »»

2016

Book Chapter

M. Reshma, Vivek, P., Gopalapillai, R., Dr. Deepa Gupta, and Sudarshan, T. S. B., “Multi-view Robotic Time Series Data Clustering and Analysis Using Data Mining Techniques”, in Advances in Signal Processing and Intelligent Recognition Systems, Springer, 2016, pp. 521–531.[Abstract]


In present world robots are used in various spheres of life. In all these areas, knowledge of the environment is required to perform appropriate actions. The information about the environment is collected with the help of onboard sensors and image capturing device mounted on the mobile robot. As the information collected is of huge volume, data mining offers the possibility of discovering the hidden knowledge from this large amount of data. Clustering is an important aspect of data mining which will be explored in detail for grouping the scenario from multiple views. © Springer International Publishing Switzerland 2016.

More »»

2015

Book Chapter

S. Mishra, Radhakrishnan, G., Dr. Deepa Gupta, and Sudarshan, T. S. B., “Acquisition and analysis of robotic data using machine learning techniques”, in Computational Intelligence in Data Mining-Volume 3, Springer India, 2015, pp. 489–498.[Abstract]


A robotic system has to understand its environment in order to perform the tasks assigned to it successfully. In such a case, a system capable of learning and decision
making is necessary. In order to achieve this capability, a system must be able to observe its environment with the help of real time data received from its sensors. This paper discusses certain experiments to highlight methods and attributes that can be used for such a learning. These experiments consider attributes recorded in different virtual environments with the

More »»

2006

Book Chapter

A. De Gispert, Dr. Deepa Gupta, Popović, M., Lambert, P., Mariño, J. B., Federico, M., Ney, H., and Banchs, R., “Improving statistical word alignments with morpho-syntactic transformations”, in Advances in Natural Language Processing, Springer Berlin Heidelberg, 2006, pp. 368–379.[Abstract]


This paper presents a wide range of statistical word alignment experiments incorporating morphosyntactic information. By means of parallel corpus transformations according to information of POS-tagging, lemmatization or stemming, we explore which linguistic information helps improve alignment error rates. For this, evaluation against a human word alignment reference is performed, aiming at an improved machine translation training scheme which eventually leads to improved SMT performance. Experiments are carried out in a Spanish–English European Parliament Proceedings parallel corpus, both in a large and a small data track. As expected, improvements due to introducing morphosyntactic information are bigger in case of data scarcity, but significant improvement is also achieved in a large data task, meaning that certain linguistic knowledge is relevant even in situations of large data availability. More »»

Invited Talk/ Guest Lecture

  • Support Vector Machine- kernels and Kernel Trick”, TEQIP-II sponsored Faculty Development Programme on “Knowledge Mining using Machine Leaning and its Applications” from 23rd January 2014 to 25th January 2014 at Department of ISE, MSRIT, Bangalore. India
  • Introduction to Machine Learning & Its Applications”, A Three days workshop on Knowledge Management and Data Analytics for real World Applications (KMDA-13)", from 22nd July to 24th July, 2013 at R. V. College of Engineering, Bangalore, India.
  • Domain Biased Bilingual Parallel Data Extraction and an Unsupervised Hybrid Approach to Align Sentences and Words in Parallel Corpora”, Workshop on " Natural Language Processing and Speech Recognition" from 15th-17th April 2013 at RV College, Bangalore India.
  • “Robotic Data Mining”, 9th International conference on Emerging Trends in Physics -2013, from 21st-22nd February 2013, Place: St. Joseph's College of Arts and Science(Autonomous), Cuddalore, TN, India.
  • “Introduction to Probability Theory and Sampling”, on 26th April, 2012 at PSE School of Engineering, Bangalore, Karnataka, India.
  • Introduction to MATLAB”, on 9th August, 2012 at Jain University Bangalore, Karnataka, India.
  • Introduction to Vector Calculus”, on 4th December, 2012 at PSEIT, South Campus, Bangalore, Karnataka, India.
  • The Role of Natural Language Processing in Electronic Health Domain/Records”, National Seminar on Data Mining and Knowledge Discovery of Medical Data, from 28th-29thJuly 2011, organized by Amrita School of Engineering, Amrita Vishwa Vidyapeetham,  Bangalore Karnataka, India.

Research Scholars

Name of the Scholar (Full Time(FT) /Part Time (PT)) Year of Registration Area of Research
G. RadhaKrishnana (PT) (Jt. supervision with Dr. TSB Sudarshan) July 2011 Robotic Data mining
Swati Sanagar (FT) July 2012 Sense Based Polarity Lexicon for Sentiment Analysis
Vani K. (FT) February 2013 Extrinsic Plagiarism Detection
Chinmayee Ojha (FT) August 2013  Statistical English-Hindi Alignments and its evaluation
Shiva kumar (PT) August 2013 Language Processing Tools for Kannada using Machine Learning Techniques by Utilizing cross Language Linguistic Rich Resources
Tina Babu (FT) (Jt. supervision with Dr. Tripty Singh) September 2015 Colon Cancer Detection and Grading using Pathological images
Manju Venugopalan (FT) February 2016 Aspect level sentiment Analysis
Priyanka Nair (FT) February 2016 Healthcare Data Mining
Veena G. (PT) February 2016 Document similarity using Deep Leaning Techniques
207
PROGRAMS
OFFERED
6
AMRITA
CAMPUSES
15
CONSTITUENT
SCHOOLS
A
GRADE BY
NAAC, MHRD
8th
RANK(INDIA):
NIRF 2018
150+
INTERNATIONAL
PARTNERS
  • Amrita on Social Media

  • Contact us

    Amrita Vishwa Vidyapeetham,
    Amritanagar,
    Coimbatore - 641 112,
    Tamil Nadu, India.
    • Fax                 : +91 (422) 268 6274
    • Coimbatore   : +91 (422) 268 5000
    • Amritapuri    : +91 (476) 280 1280
    • Bengaluru     : +91 (080) 251 83700
    • Kochi              : +91 (484) 280 1234
    • Mysuru          : +91 (821) 234 3479
    • Chennai         : +91 (44 ) 276 02165
    • Contact Details »