Dr. Deepa Gupta received the bachelor degree in Mathematics from Delhi University (India) with honors in 1997, the Master in Mathematics from Indian Institute of Technology Delhi (IITD), India in 1999, and the Ph.D. in Natural Language Processing from Department of Mathematics and computer application from IIT Delhi in 2005. She worked as a Postdoc researcher at FBK-IRST (Center for Scientific and Technological Research), Trento, Italy from February 2005 to March 2007. She worked as an Assistant Professor at IIIT Bangalore, India from April 2007 to December 2007. And from 2009 to July 2014, she is associated with Amrita Vishwa Vidyapeetam as Assistant Professor in the department of Mathematics. From July 2014, she is working as an associate professor in the same institute. Her research interest include Natural Language Processing, Statistical Machine Translation, Data Mining, Text analytics , Machine Learning etc.


Degree University Year Subjects
B. A.(Hons) Delhi University 1997 Mathematics
M. Sc. Indian Institute of Technology, Delhi (IITD) 1999 Applied Mathematics
Ph. D. Indian Institute of Technology, Delhi (IITD) 2005 Natural Language Processing



Publication Type: Journal Article
Year of Publication Publication Type Title
2016 Journal Article NaRiya Ravi and Deepa Gupta, “A plagiarized source retrieval system developed using efficient download filtering and POS tagged query formulation with effective paragraph based chunking”, International Journal of Artificial Intelligence, vol. 14, pp. 145-160, 2016.[Abstract]

Source Retrieval is an important task of External Plagiarism Detection system which involves in identifying a set of candidate source documents for a given suspicious document. Not to lose any actual source document while reducing the size of the candidate source document set is crucial. This paper describes the approach of Source Retrieval task of External Plagiarism Detection System. The approach includes chunking of documents based on paragraphs along with Part-of- Speech tagging and an efficient download filtering method. The proposed system is evaluated against PAN 2011-12, PAN 2012-13 PAN 2014-15 Test Data Set and results are analysed and compared using standard PAN measures: Recall, Precision, F Measure, average number of queries and downloads. The proposed approach exhibited improved efficiency in PAN 2015 conducted by PAN CLEF Evaluation lab1, by acquiring highest values for F Measure and Precision along with least Downloads. The results are further improved by incorporating efficient query and download filtering mechanisms over the proposed system. The effect of the enhanced proposed system is also discussed and analysed in this paper. © 2016 CESER PUBLICATIONS.

More »»
2015 Journal Article Deepa Gupta, Aswathi, T., and Yadav, RcKumar, “Investigating Bidirectional Divergence In Lexical-Semantic Class For English-Hindi-Dravidian Translations”, International Journal of Applied Engineering Research, vol. 10, pp. 8851-8884, 2015.[Abstract]

The fail factor of a Machine Translation system is largely governed by the degree of cross linguistic variations called divergence, which are encountered during translations. Divergence is a language dependent phenomenon. More the divergence more is the skewness in the results delivered by the MT system for the considered language pair. All work reported thus far study divergence in a unidirectional manner i.e. from Source Language (SL) to Target Language (TL). Through this paper, we aim to present a detailed exploration of bidirectional divergence found in English-Hindi-Dravidian language triplet, which has never been explored thus far. Moreover, if transparency in cross linguistic variations should be obtained, then two way divergences should not be overlooked. Toward this end, we propose the concept of balance and imbalance observed in to and fro divergence. Balanced being the case when the two way divergence is preserved on mirror while imbalance, when the same is violated. The Dravidian languages under consideration are Malayalam, Tamil, Telugu and Kannada. The divergence analysis has been dealt for the pairing of each of these languages with English based on children stories, translation books, advertisement material and official articles and documents. © Research India Publications.

More »»
2015 Journal Article Ca Priyanka and Deepa Gupta, “Fine grained sentiment classification of customer reviews using computational intelligent technique”, International Journal of Engineering and Technology, vol. 7, pp. 1453-1468, 2015.[Abstract]

Online reviews are now popularly used for judging quality of product or service and influence decision making of users while selecting a product or service. Due to innumerous number of customer reviews on the web, it is difficult to summarize them which require a faster opinion mining system to classify the reviews. Many researchers have explored various supervised and unsupervised machine learning techniques for binary classification of reviews. Compared to these techniques, fuzzy logic can provide a straightforward and comparatively faster way to model the fuzziness existing between the sentiment polarities classes due to the ambiguity present in most of the natural languages. But the fuzzy logic techniques are less explored in this domain. Hence in this paper, a fuzzy logic model based on the most popularly known sentiment based lexicon SentiWordNet has been proposed for fine grained classification of the reviews into weak positive, moderate positive, strong positive, weak negative, moderate negative and strong negative classes. Experiments have been conducted on datasets containing reviews of electronic products namely smart phones, LED TV and laptops and have shown to provide fine grained classification accuracy approximately in the range of 74% to 77%.

More »»
2015 Journal Article V. Dominic, Deepa Gupta, and Khare, S., “An Effective Performance Analysis of Machine Learning Techniques for Cardiovascular Disease”, Applied Medical Informatics, vol. 36, p. 23, 2015.[Abstract]

Abstract Machine learning techniques will help in deriving hidden knowledge from clinical data which can be of great benefit for society, such as reduce the number of clinical trials required tor precise diagnosis of a disease of a person etc. Various areas of study are available in healthcare domain like cancer, diabetes, drugs etc. This paper focuses on heart disease dataset and how machine learning techniques can help in understanding the level of risk associated with heart diseases. Initially, data is preprocessed then analysis is done

More »»
2015 Journal Article D. Vinitha, Deepa Gupta, and Khare, S., “Exploration of Machine Learning Techniques for Cardiovascular Disease”, Applied Medical Informatics, vol. 36, pp. 23–32, 2015.[Abstract]

Machine learning techniques will help in deriving hidden knowledge from clinical data which can be of great benefit for society, such as reduce the number of clinical trials required for precise diagnosis of a disease of a person etc. Various areas of study are available in healthcare domain like cancer, diabetes, drugs etc. This paper focuses on heart disease dataset and how machine learning techniques can help in understanding the level of risk associated with heart diseases. Initially, data is preprocessed then analysis is done in two stages, in first stage feature selection techniques are applied on 13 commonly used attributes and in second stage feature selection techniques are
applied on 75 attributes which are related to anatomic structure of the heart like blood vessels of the heart, arteries etc. Finally, validation of the reduced set of features using an exhaustive list of classifiers is done.In parallel study of the anatomy of the heart is done using the identified features and the characteristics of each class is understood. It is observed that these reduced set of features are anatomically relevant. Thus, it can be concluded that, applying machine learning techniques on clinical data is beneficial and necessary.

More »»
2015 Journal Article M. Venugopalan and Deepa Gupta, “An Enhanced Polarity Lexicon by Learning-based Method Using Related Domain Knowledge”, International Journal of Information Processing and Management, vol. 6, no. 2, pp. 61–72, 2015.[Abstract]

The inborn human instinct to know what others think has contributed to the growing popularity of Sentiment Analysis. Sentiment in a text is mostly derived from opinion oriented words or lexicons. But the challenge is put forward by opinion oriented words which are domain specific. Various researchers have proposed methods to improve the polarity scores of these domain specific lexicons. Existing works utilize mainly single domain knowledge which is not sufficient to update a domain-specific lexicon. The proposed work attempts a domain adaptation model by building a polarity lexicon using knowledge from multiple related domains. The polarity lexicon thus built when tested on new domains provides fairly good classification results thus implementing true domain adaptation. The proposed approach has been tested on Amazon product reviews from ten related domains which include Printer, Scanner, MP3 Player, iPod, LCD TV etc. A significant improvement in accuracy ranging from 1 to 14.5 points on learned domains and 0.5 to 8 points across new domains over the baseline has been observed.

More »»
2015 Journal Article R. N Ravi and Deepa Gupta, “Efficient Paragraph based Chunking and Download Filtering for Plagiarism Source Retrieval”, 2015.[Abstract]

This paper describes the approach of the system that we built as part of the participation in ‘PAN 2015 Source Retrieval’ task. Chunking of documents based on paragraphs and efficient download filtering improved the overall performance of the system. Source Retrieval is an important task of a Plagiarism Detection system

More »»
2015 Journal Article T. Ramesh, Nanjangud, N., Kothapalli, K., Deepa Gupta, Chaudhary, F. Sanjay, Univetsity, A., and Sushil K Prasad, “2015 Eighth International Conference on Contemporary Computing (IC3)”, 2015.[Abstract]

We welcome you to the 20 15 Eight International Conference on Contemporary Computing (IC3). This is the eighth conference in the series, held annually at the Jaypee Institute of Information Technology (nIT), and organized jointly by the hosts and the University of Florida, Gainesville, USA. The conference focuses on issues of contemporary interest in
computing, spanning systems, algorithms and applications. We do hope you will find the conference and these proceedings exciting and rewarding

More »»
2015 Journal Article T. Aswathi and Deepa Gupta, “Unsupervised Shallow Morph-Analyzer for Malayalam Using Recursive Clustering Based Approach”, International Journal of Applied Engineering Research, vol. 9, no. 23, pp. 21197–21215, 2015.
2014 Journal Article Deepa Gupta, .Vani, k, and Singh.Charan Kamal, “Using Natural Language Processing Techniques and Fuzzy-Semantic Similarity for Automatic External Plagiarism Detection”, Third International Conference on Advances in Computing, Communications and Informatics (ICACCI-2014), pp. 2694–2699, 2014.[Abstract]

Plagiarism is one of the most serious crimes in academia and research fields. In this modern era, where access to information has become much easier, the act of plagiarism is rapidly increasing. This paper aligns on external plagiarism detection method, where the source collection of documents is available against which the suspicious documents are compared. Primary focus is to detect intelligent plagiarism cases where semantics and linguistic variations play an important role. The paper explores the different preprocessing methods based on Natural Language Processing (NLP) techniques. It further explores fuzzy-semantic similarity measures for document comparisons. The system is finally evaluated using PAN 20121 data set and performances of different methods are compared.

More »»
2014 Journal Article Deepa Gupta and Aswathi, T., “Unsupervised Shallow Morph-Analyzer for Malayalam Using Recursive Based Approach.”, International Journal of Applied Engineering Research, vol. 9, 2014.
2013 Journal Article Deepa Gupta and Nair, LbMadhu, “Improving OCR by effective pre-processing and segmentation for Devanagiri script:A quantified study”, Journal of Theoretical and Applied Information Technology, vol. 52, pp. 142-153, 2013.[Abstract]

<p>Optical Character Recognition (OCR) system aims to convert optically scanned text image to a machine editable text form. Multiple approaches to preprocessing and segmentation exist for various scripts. However, only a restricted combination of the same has been experimented on Devanagari script. This paper proposes a study which aims to explore and bring out an alternative and efficient strategy of preprocessing and segmentation in handling OCR for Devanagari scripts. Efficiency evaluation of the proposed alternative has been undertaken by subjecting it to documents with varying degree of noise severity and border artifacts. The experimental results confirm our proposition to be superior approach over other conventional methodologies to OCR system implementation for Devanagari scripts. Also described is detailed approach to conventional pre-processing involved in initial stage of OCR, including noise removal techniques, along with the other conventional approaches to segmentation. The proposed alternative has been deployed to reach character and top character segmentation level. © 2005 - 2013 JATIT &amp; LLS. All rights reserved.</p>

More »»
2012 Journal Article Deepa Gupta, Yadav, R. Kumar, and Sajan, N., “Improving unsupervised stemming by using partial lemmatization coupled with data-based heuristics for Hindi”, International Journal of Computer Applications, vol. 38, pp. 1–8, 2012.
2009 Journal Article Deepa Gupta, “Will Sentences Have Divergence Upon Translation?: A Corpus-Evidence Based Solution for Example Based Approach”, Language in India, vol. 9, no. 10, pp. 1930–2940, 2009.
2007 Journal Article Deepa Gupta, Cettolo, M., and Federico, M., “POS-based reordering models for statistical machine translation”, Proceedings of the Machine Translation Summit (MT-Summit), 2007.[Abstract]

We present a novel word reordering model for phrase-based statistical machine translation suited to cope with long-span word movements.In particular, reordering of nouns, verbs and adjectives is modeled by taking into account target-to-source word alignments and the distances between source as well as target words. The proposed model was applied as a set of additional feature functions to re-score N-best translation candidates generated by a statistical machine translation system featuring state-of-the-art lexicalized reordering models. Experiments showed relative BLEU score improvement up to 7.3% on the BTEC Japanese-to-English task, and up to 1.1% on the Europarl German-to-English task More »»
2005 Journal Article Deepa Gupta, “Contributions to english to hindi machine translation using example-based approach”, Unpublished Ph. D. Thesis submitted at Indian Institute of Technology, Delhi, 2005.[Abstract]

This research focuses on development of Example Based Machine Translation (EBMT) system for English to Hindi. Development of a machine translation (MT) system typically demands a large volume of computational resources. For example, rulebased MT systems require extraction of syntactic and semantic knowledge in the form of rules, statistics-based MT systems require huge parallel corpus containing sentences in the source languages and their translations in target language. Requirement of such computational resources is much less in respect of EMBT. This makes development of EBMT systems for English to Hindi translation feasible, where availability of large-scale computational resources is still scarce. The primary motivation for this work comes because of the following More »»
2003 Journal Article Deepa Gupta and Chatterjee, N., “Identification of divergence for English to Hindi EBMT”, Proceeding of MT Summit-IX, pp. 141–148, 2003.[Abstract]

Divergence is a key aspect of translation between two languages. Divergence occurs when structurally similar sentences of the source language do not translate into sentences that are similar in structures in the target language. Divergence assumes special significance in the domain of Example-Based Machine Translation (EBMT). An EBMT system generates translation of a given sentence by retrieving similar past translation examples from its example base and then adapting them suitably to meet the current translation requirements. Divergence imposes a great challenge to the success of EBMT. The present work provides a technique for identification of divergence without going into the semantic details of the underlying sentences. This identification helps in partitioning the example database into divergence / non-divergence categories, which in turn should facilitate efficient retrieval and adaptation in an EBMT system More »»
2003 Journal Article N. Chatterjee and Deepa Gupta, “Divergence in English to Hindi Translation: Some Studies”, International Journal of Translation, vol. 15, pp. 5–24, 2003.
2003 Journal Article Deepa Gupta and Chatterjee, N., “A Morpho-Syntax Based Adaptation and Retrieval Scheme for English to Hindi Example Based Machine Translation”, EACL 2003, 2003.[Abstract]

This paper focuses on Example Based Machine Translation (EBMT) between English and Hindi, the most popular language in South Asia. Given an input sentence, an EBMT system retrieves similar sentence (s) from its example base and adapts their translation (s) suitably to generate the translation of the given input. This paper proposes a systematic adaptation scheme that takes into account the morphology and syntax of the input and the retrieved source language sentences. More »»
2001 Journal Article Deepa Gupta and Chatterjee, N., “Study of divergence for example based English-Hindi machine translation”, STRANS-2001, IIT Kanpur, pp. 43–51, 2001.
Publication Type: Book Chapter
Year of Publication Publication Type Title
2016 Book Chapter R. N Ravi, Vani, K., and Deepa Gupta, “Exploration of Fuzzy C Means Clustering Algorithm in External Plagiarism Detection System”, in Intelligent Systems Technologies and Applications, vol. 1, Springer International Publishing, 2016, pp. 127–138.[Abstract]

With the advent of World Wide Web, plagiarism has become a prime issue in field of academia. A plagiarized document may contain content from a number of sources available on the web and it is beyond any individual to detect such plagiarism manually. This paper focuses on the exploration of soft clustering, via, Fuzzy C Means algorithm in the candidate retrieval stage of external plagiarism detection task. Partial data sets from PAN 2013 corpus is used for the evaluation of the system and the results are compared with existing approaches, via, N-gram and K Means Clustering. The performance of the systems is measured using the standard measures, precision and recall and comparison is done.

More »»
2015 Book Chapter S. Mishra, Radhakrishnan, G., Deepa Gupta, and Sudarshan, T. S. B., “Acquisition and analysis of robotic data using machine learning techniques”, in Computational Intelligence in Data Mining-Volume 3, Springer India, 2015, pp. 489–498.[Abstract]

A robotic system has to understand its environment in order to perform the tasks assigned to it successfully. In such a case, a system capable of learning and decision
making is necessary. In order to achieve this capability, a system must be able to observe its environment with the help of real time data received from its sensors. This paper discusses certain experiments to highlight methods and attributes that can be used for such a learning. These experiments consider attributes recorded in different virtual environments with the

More »»
2014 Book Chapter R. Gopalapillai, Deepa Gupta, and Sudarshan, T. S. B., “Experimentation and analysis of time series data for rescue robotics”, in Recent Advances in Intelligent Informatics, Springer International Publishing, 2014, pp. 443–453.[Abstract]

In today’s world, rescue robots are used in various life threatening situations where human help or support is not possible. These robots transfer real time data about the environment continuously. Research is focussed on techniques to analyse real time data to enable Decision Support Systems (DSS) to take timely actions to save lives. This paper discusses preliminary experiments that have been carried out to simulate a set of simple robotic environments. A robot attached with four sensors is used to collect information about the environments as the robot moves in a straight line path. Time series data collected from these experiments are clustered using data mining techniques. Experimental results show recall and precision between 73% to 98%.

More »»
2006 Book Chapter A. De Gispert, Deepa Gupta, Popović, M., Lambert, P., Mariño, J. B., Federico, M., Ney, H., and Banchs, R., “Improving statistical word alignments with morpho-syntactic transformations”, in Advances in Natural Language Processing, Springer Berlin Heidelberg, 2006, pp. 368–379.[Abstract]

This paper presents a wide range of statistical word alignment experiments incorporating morphosyntactic information. By means of parallel corpus transformations according to information of POS-tagging, lemmatization or stemming, we explore which linguistic information helps improve alignment error rates. For this, evaluation against a human word alignment reference is performed, aiming at an improved machine translation training scheme which eventually leads to improved SMT performance. Experiments are carried out in a Spanish–English European Parliament Proceedings parallel corpus, both in a large and a small data track. As expected, improvements due to introducing morphosyntactic information are bigger in case of data scarcity, but significant improvement is also achieved in a large data task, meaning that certain linguistic knowledge is relevant even in situations of large data availability. More »»
Publication Type: Conference Paper
Year of Publication Publication Type Title
2016 Conference Paper D. Mishra, Venugopalan, M., and Deepa Gupta, “Context Specific Lexicon for Hindi Reviews”, in Procedia Computer Science, 2016, vol. 93, pp. 554 - 563.[Abstract]

In the era of social networking, immense amount of posts, comments and tweets generated every second are increasing the size of social database. The analysis of this voluminous data is necessary for exploring the orientation of people's opinion about a particular entity. Most of the online data are in English language, but due to increase in technology and improved awareness of people, the online data available in Indian languages are gradually increasing. Sentiment analysis of English language alone is not sufficient to know the inclination of people towards an entity, other Indian language sentiment analysis is a must, their contribution is also important for us. The available sentiment classification lexicon resources like Hindi SentiWordNet are generic in nature and hence results in average sentiment classification accuracy due to contextual dependency. To improve the sentiment classification accuracy, we present an improvised lexicon resource for Hindi language for Hotel and Movie domains. The improvised polarity lexicon has been built reflecting context sensitivity and to increase coverage it has been expanded used synonyms based approach. The built polarity lexicon resource showcases an improvement in accuracy of 42% and 78% in Movie and Hotel domain, respectively, compared to the existing Hindi SentiWordNet lexicon resource. More »»
2015 Conference Paper Ka Vani, Deepa Gupta, Krishnaswamy D, Thampi S.M, Callegari C, Alcaraz Calero J.M, Takagi H, Mauri J.L, Meghanathan, N., Rodrigues, J., Bojkovic Z.S, ,, Wozniak, M., Sahni, S., Vinod M., Prasad, N. R., Que X., and Au E., “Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system”, in 2015 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2015, 2015, pp. 1578-1584.[Abstract]

Plagiarism is an illicit act which has become a prime concern mainly in educational and research domains. This deceitful act is usually referred as an intellectual theft which has swiftly increased with the rapid technological developments and information accessibility. Thus the need for a system/ mechanism for efficient plagiarism detection is at its urgency. In this paper, an investigation of different combined similarity metrics for extrinsic plagiarism detection is done and it focuses on unfolding the importance of combined similarity metrics over the commonly used single metric usage in plagiarism detection task. Further the impact of utilizing part of speech tagging (POS) in the plagiarism detection model is analyzed. Different combinations of the four single metrics, Cosine similarity, Dice coefficient, Match coefficient and Fuzzy-Semantic measure is used with and without POS tag information. These systems are evaluated using PAN1 -2014 training and test data set and results are analyzed and compared using standard PAN measures, viz, recall, precision, granularity and plagdet-score. © 2015 IEEE.

More »»
2015 Conference Paper M. Venugopalan and Deepa Gupta, “Exploring sentiment analysis on Twitter data”, in Eighth International Conference on Contemporary Computing (IC3), 2015.[Abstract]

The growing popularity of microblogging websites has transformed these into rich resources for sentiment mining. Even though opinion mining has more than a decade of research to boost about, it is mostly confined to the exploration of formal text patterns like online reviews, news articles etc. Exploration of the challenges offered by informal and crisp microblogging have taken roots but there is scope for a large way ahead. The proposed work aims at developing a hybrid model for sentiment classification that explores the tweet specific features and uses domain independent and domain specific lexicons to offer a domain oriented approach and hence analyze and extract the consumer sentiment towards popular smart phone brands over the past few years. The experiments have proved that the results improve by around 2 points on an average over the unigram baseline.

More »»
2015 Conference Paper M. Venugopalan and Deepa Gupta, “Sentiment Classification for Hindi Tweets in a Constrained Environment Augmented Using Tweet Specific Features”, in Mining Intelligence and Knowledge Exploration, 2015, pp. 664–670.[Abstract]

India being a diverse country rich in spoken languages with around 23 official languages has always left open a wide arena for NLP researchers. The increase in the availability of voluminous data in Indian languages in the recent years has prompted researchers to explore the challenges in the Indian language domain. The proposed work explores Sentiment Analysis on Hindi tweets in a constrained environment and hence proposes a model for dealing with the challenges in extracting sentiment from Hindi tweets. The model has exhibited an average performance with cross validation accuracy for training data around 56 % and a test accuracy of 43 %.

More »»
2015 Conference Paper Va Dominic, Deepa Gupta, S.a Khare, and Aggarwal, Ab, “Investigation of chronic disease correlation using data mining techniques”, in 2015 2nd International Conference on Recent Advances in Engineering and Computational Sciences, RAECS 2015, 2015.[Abstract]

A disease is an abnormal condition that affects the structure and function of one or more parts of the body. It may be caused by various factors, external and internal dysfunctions. There is a trend of various chronic diseases in any society. The major concern is that these chronic diseases are leading to many other diseases in future. An attempt to explore the correlation of various chronic diseases has become a necessity. This can be achieved by using data mining techniques, which help to derive knowledge about the affects of a particular chronic disease on the other chronic diseases. Since there is growing trend of diabetes and ischemic heart disease in the society, in this paper the focus is to investigate the effect of these diseases on the other chronic diseases using the ICD9 diagnostic codes. To achieve this goal various types of data mining techniques are used. The conclusion is an optimal set of ICD9 diagnostic codes associated with individuals having diabetes or ischemic heart disease. These codes are then investigated based on the human anatomic systems i.e. Circulatory system, Respiratory system, Nervous system, Musculoskeletal system, Renal system and Neoplasm and their relevance is justified. © 2015 IEEE.

More »»
2015 Conference Paper P. Vivek, G. Radhakrishnan, Deepa Gupta, and T.S.B. Sudarshan, “Clustering of robotic environment using image data stream”, in International Conference Communication, Control and Intelligent Systems, CCIS 2015, 2015, pp. 208-213.[Abstract]

Mobile robots are being used in various applications like space shuttles, intelligent home security, military applications or other service oriented applications where human intervention is limited. A robot has to understand its environment by analyzing the data to take the appropriate actions in the given environment. Mostly the data collected from the sensors on the robots are huge and continuous, making it impossible to store the entire data in main memory and hence allowing only single scan of data. Traditional clustering algorithms like k-means cannot be used in such environment as they require multiple scan of data. This paper presents an experimental study on the implementation of Stream KM++, a data stream clustering algorithm that effectively cluster these time series robotic image data within the memory restrictions under various conditions. Promising results are obtained from the various experiments carried out. More »»
2015 Conference Paper S. Sanagar and Deepa Gupta, “Adaptation of multi-domain corpus learned seeds and polarity lexicon for sentiment analysis”, in 2015 International Conference on Computing and Network Communications, CoCoNet 2015, 2015, pp. 50-58.[Abstract]

Sentiment analysis has emerged as an independent branch of research and attracted many researchers in recent years. Analysis of sentiment deals with expressed opinions. That makes it widely applicable in every part of life and in businesses where opinion counts. Opinions are expressed by the means of opinion oriented words which are part of sentiment analysis resource such as polarity lexicon. Polarity lexicon construction is widely explored by researchers using various supervised and semi-supervised approaches. Semi-supervised approaches are often combined with polarity seed information. A novel semi-supervised approach is proposed to construct polarity lexicon using iterative Latent Semantic Analysis technique from unlabeled multiple source domains corpus. This polarity lexicon is adaptable across multiple target domains. In the process seed words are learned from multiple domain corpus and subsequently adapted to new domains. Significant improvement in accuracy is observed over the baselines. © 2015 IEEE.

More »»
2015 Conference Paper G. Radhakrishnan, Deepa Gupta, Sindhuula, S., Khokhawat, S., and T.S.B. Sudarshan, “Experimentation and analysis of time series data from multi-path robotic environment”, in 2015 IEEE International Conference on Electronics, Computing and Communication Technologies, CONECCT 2015, 2015.[Abstract]

Autonomous mobile robots are increasingly used in many application areas. In most applications, they have to explore and gather knowledge about the environment they are deployed in. These robots transfer real time data about the environment continuously. This paper discusses a set of experiments that have been carried out to simulate various robotic environments. A robot attached with four sensors is used to collect information about the environment as the robot moves in multiple straight line paths. Time series data collected from these experiments are clustered using data mining techniques. Experimental results show clustering accuracies vary depending on the number of clusters formed. © 2015 IEEE.

More »»
2014 Conference Paper Deepa Gupta and Verma, R., “An enhanced cluster-head selection scheme for distributed heterogeneous wireless sensor network”, in Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on, 2014.[Abstract]

Cluster Head selection method is a critical and energy constraint process in wireless sensor network. This process required significant amount of energy affecting the performance and operation of wireless sensor network. The advantageous heterogeneous wireless sensor network provides different type of data from different variety of sensors in same network but because of complex network operations it shows poor performance. For enhanced performance of wireless sensor network, improvements are needed at some critical parameters such as energy potency, network lifetime, node readying, fault lenience and dormancy. The proposed Cluster-Head selection scheme deals with two level heterogeneous wireless sensor networks. Improved Cluster-Head selection process ends up in less energy consumption that prolongs the network lifetime and stability.

More »»
2014 Conference Paper K. S. Vishnu, Apoorva, T., and Deepa Gupta, “Learning Domain-Specific and Domain-independent Opinion Oriented Lexicons using Multi Domain Knowledge”, in Sixth International Conference on Contemporary Computing (IC3-2014), 2014.[Abstract]

Sentiment analysis systems are used to know the opinions of customer reviews. The basic resource for the sentiment analysis systems are polarity lexicon. Each term in polarity lexicon indicates its affinity towards positive or negative opinion. However, this affinity of word changes with the change in domain. In this work, we explore a polarity lexicon using SentiWordNet, a domain independent lexicon to adapt specific domain and update the domain independent lexicon based on multiple domain knowledge. The proposed approach has been tested on five domains: Health, Books, Camera, Music and DVD. The improvement in accuracy ranges from 4.5 to 19 pointsacross all the domains over baseline.

More »»
2014 Conference Paper S. Swetha, Deepa Gupta, Radhakrishnan, G., and Sudarshan, T. S. B., “Analysis of Robotic Environment using Low Resolution Image Sequence”, in International Conference on Contemporary Computing and Informatics, Mysore, 2014.[Abstract]

Mobile robots have been widely used in agricultural, industrial and military applications and areas where human intervention is not possible. These robots are equipped with sensors and image capturing devices to collect time series data from the environment. The data thus collected are analyzed to obtain vital information about the environment. This paper presents an experimental study using low resolution image data captured from a set of complex robotic scenarios. Features such as color and texture are extracted from the images. The scenarios are then clustered based of the extracted features using k-medoids algorithm. Clustering accuracy has been analyzed with different image resolutions and feature extraction methods

More »»
2014 Conference Paper V. K and Deepa Gupta, “Using K-means Cluster based Techniques in External Plagiarism Detection”, in International Conference on Contemporary Computing and Informatics, Mysore, 2014.[Abstract]

Text document categorization is one of the rapidly emerging research fields, where documents are identified, differentiated and classified manually or algorithmically. The paper focuses on application of automatic text document categorization in plagiarism detection domain. In today's world plagiarism has become a prime concern, especially in research and educational fields. This paper aims on the study and comparison of different methods of document categorization in external plagiarism detection. Here the primary focus is to explore the unsupervised document categorization/ clustering methods using different variations of K-means algorithm and compare it with the general N-gram based method and Vector Space Model based method. Finally the analysis and evaluation is done using data set from PAN-20131 and performance is compared based on precision, recall and efficiency in terms of time taken for algorithm execution.

More »»
2014 Conference Paper G. Radhakrishnan, Sudarshan, T. S. B., Murali, M., and Deepa Gupta, “Clustering of Robotic Environments Using Image Sequence Data”, in Eighth International Conference on Data Mining and Warehousing (ICDMW-2014), 2014.
2014 Conference Paper G. Radhakrishnan, Sudarshan, T. S. B., Mishra, S., and Deepa Gupta, “Acquistion and Analysis of Robotic Data using Machine learning Techniques”, in International Conference on Computational Intelligence in Data Mining, 2014.
2013 Conference Paper C. .Priyanka and Deepa Gupta, “Identifying the Best Feature Combination for Sentiment Analysis of Customer Reviews”, in Second International Conference on Advances in Computing, Communications and Informatics. (ICACCI-2013), 2013.[Abstract]

Opinions are increasingly available in form of reviews and feedback at websites, blogs, and microblogs which influence future customers. From human perspective, it is difficult to read all the opinions and summarize them which require an automated and faster opinion mining to classify the reviews. In this paper different features namely, N-gram features, POS based features and features based on the lexicon SentiWordNet, have been investigated. The Support Vector Machines (SVM) classifier has been modeled with presence as feature representation for classification of the reviews into positive and negative classes thereby identifying the best feature combination. Results of Experiments conducted on smart phone reviews for different feature combinations have been presented. A highest accuracy up till 92% and 95% has been obtained for small and large datasets, respectively.

More »»
2013 Conference Paper R. Gopalapillai, Vidhya, J., Deepa Gupta, and Sudarshan, T. S. B., “Classification of Robotic Data using Artificial Neural Network”, in 2nd IEEE International Conference on Recent Advances in Intelligent Computational Systems (RAICS 2013), 2013.[Abstract]

As time series data are common in the field of science and commerce, time series data analysis has an important role in these areas for extracting information from available data. This paper presents the application of Artificial Neural Networks (ANN) for analyzing huge amount of time series data collected by sensors mounted on a robot navigating in a simulated environment. The Artificial Neural Network system employing back propagation learning algorithm classified different scenarios encountered by the robot using the data collected by sensors.

More »»
2012 Conference Paper G. Radhakrishnan, Deepa Gupta, Abhishek, R., Ajith, A., and T.S.B. Sudarshan, “Analysis of multimodal time series data of robotic environment”, in International Conference on Intelligent Systems Design and Applications, ISDA, Kochi, 2012, pp. 734-739.[Abstract]

<p>Autonomous mobile robots equipped with an array of sensors are being increasingly deployed in disaster environments to assist rescue teams. The sensors attached to the robots send multimodal time series data about the disaster environments which can be analyzed to extract useful information about the environment in which the robots are deployed. A set of data mining tasks that effectively cluster various robotic environments have been investigated. The effectiveness of these data mining techniques have been demonstrated using an available robotic dataset. The accuracy of the proposed technique has been measured using a manual reference cluster set. © 2012 IEEE.</p>

More »»
2010 Conference Paper R.K. Yadav and Deepa Gupta, “Annotation guidelines for Hindi-English word alignment”, in Proceedings - 2010 International Conference on Asian Language Processing, IALP 2010, Harbin, 2010, pp. 293-296.[Abstract]

A duo such as Hindi-English (Hin-Eng) does differ in terms of grammar, and thus finding correspondences is often quite obscure in word alignment. Hindi being rich in morphology makes the alignment with its counterpart a bit contingent and invites obscurities in annotation process. We present annotation guidelines for Hin-Eng word alignment through contrastive analysis of the two languages. We applied existing guidelines for Dutch-English [1], coupled with Blinker project guidelines [2] and augmented them to cover frequently occurring cases in our corpus. We discuss verbal system which causes most linking obscurities by analyzing verb morphology which allows us to define consistent and systematic instructions for manual word alignment. © 2010 IEEE.

More »»
2010 Conference Paper E. Venkataramani and Deepa Gupta, “English-hindi automatic word alignment with scarce resources”, in Asian Language Processing (IALP), 2010 International Conference on, 2010, pp. 253-256.[Abstract]

Many automatic word alignment techniques have been so far developed in Natural Language Processing (NLP). However, word alignment between English and Hindi has not progressed much due to two main reasons viz. complex structure of the participating languages and the scarcity of Hindi-language resources. This paper provides a corpus-augmented method of word alignment in which these limitations have been overcome. We see this work as an improved approach in establishing a word alignment algorithm with scarce resources for Indian languages in general and for English-Hindi in particular. More »»
2007 Conference Paper N. Bertoldi, Brugnara F, Cattoni, R., Cettolo, M., Chen, B., Federico M, Giuliani D., Gretter, R., Deepa Gupta, Seppi. D., and , “The IRST English-Spanish Translation System for European Parliament Speeches”, in Interspeech, 2007.[Abstract]

This paper presents the spoken language translation system developed at FBK-irst during the TC-STAR project. The system integrates automatic speech recognition with machine translation through the use of confusion networks, which permit to represent a huge number of transcription hypotheses generated by the speech recognizer. Confusion networks are efficiently decoded by a statistical machine translation system which computes the most probable translation in the target language. This paper presents the whole architecture developed for the translation of political speeches held at the European Parliament, from English to Spanish and vice ver More »»
2006 Conference Paper M. Popović, Ney, H., De Gispert, A., Mariño, J. B., Deepa Gupta, Federico, M., Lambert, P., and Banchs, R., “Morpho-syntactic information for automatic error analysis of statistical machine translation output”, in Proceedings of the workshop on statistical machine translation, 2006.
2006 Conference Paper M. Federico and Deepa Gupta, “Exploiting Word Transformation in Statistical Machine Translation from Spanish to English”, in 11th Annual conference of the European Association for Machine Translation (EAMT), 2006, pp. 75–80.
2004 Conference Paper S. Goyal, Chatterjee, N., and Deepa Gupta, “A Study of Hindi Translation Patterns for English Sentences with "Have" as the Main Verb”, in the International Symposium on MT, NLP and Translation Support Systems: iSTRANS-2004, 2004.
2003 Conference Paper N. Chatterjee and Deepa Gupta, “A Morpho-Syntax based Adaptation and Retrieval Scheme for English to Hindi EBMT”, in Workshop on Computational Linguistic for the Languages of South Asia: Expanding Synergies with Europe, 2003, p. 23.[Abstract]

This paper focuses on Example Based Machine Translation (EBMT) between English and Hindi, the most popular language in South Asia. Given an input sentence, an EBMT system retrieves similar sentence (s) from its example base and adapts their translation (s) suitably to generate the translation of the given input. This paper proposes a systematic adaptation scheme that takes into account the morphology and syntax of the input and the retrieved source language sentences. More »»
2002 Conference Paper N. Chatterjee and Deepa Gupta, “Study of Similarity and its Measurement for English to Hindi EBMT”, in STRANS-2002, 2002.
2002 Conference Paper N. Chatterjee and Deepa Gupta, “A Systematic Adaptation Scheme for English-Hindi Example-Based Machine Translation”, in STRANS-2002, 2002.

Invited Talk/ Guest Lecture

  • Support Vector Machine- kernels and Kernel Trick”, TEQIP-II sponsored Faculty Development Programme on “Knowledge Mining using Machine Leaning and its Applications” from 23rd January 2014 to 25th January 2014 at Department of ISE, MSRIT, Bangalore. India
  • Introduction to Machine Learning & Its Applications”, A Three days workshop on Knowledge Management and Data Analytics for real World Applications (KMDA-13)", from 22nd July to 24th July, 2013 at R. V. College of Engineering, Bangalore, India.
  • Domain Biased Bilingual Parallel Data Extraction and an Unsupervised Hybrid Approach to Align Sentences and Words in Parallel Corpora”, Workshop on " Natural Language Processing and Speech Recognition" from 15th-17th April 2013 at RV College, Bangalore India.
  • “Robotic Data Mining”, 9th International conference on Emerging Trends in Physics -2013, from 21st-22nd February 2013, Place: St. Joseph's College of Arts and Science(Autonomous), Cuddalore, TN, India.
  • “Introduction to Probability Theory and Sampling”, on 26th April, 2012 at PSE School of Engineering, Bangalore, Karnataka, India.
  • Introduction to MATLAB”, on 9th August, 2012 at Jain University Bangalore, Karnataka, India.
  • Introduction to Vector Calculus”, on 4th December, 2012 at PSEIT, South Campus, Bangalore, Karnataka, India.
  • The Role of Natural Language Processing in Electronic Health Domain/Records”, National Seminar on Data Mining and Knowledge Discovery of Medical Data, from 28th-29thJuly 2011, organized by Amrita School of Engineering, Amrita Vishwa Vidyapeetham,  Bangalore Karnataka, India.


Research Scholars

Name of the Scholar (Full Time(FT) /Part Time (PT)) Year of Registration Area of Research
G. RadhaKrishnana (PT) (Jt. supervision with Dr. TSB Sudarshan) July 2011 Robotic Data mining
Swati Sanagar (FT) July 2012 Sense Based Polarity Lexicon for Sentiment Analysis
Vani K. (FT) February 2013 Extrinsic Plagiarism Detection
Chinmayee Ojha (FT) August 2013  Statistical English-Hindi Alignments and its evaluation
Shiva kumar (PT) August 2013 Language Processing Tools for Kannada using Machine Learning Techniques by Utilizing cross Language Linguistic Rich Resources
Tina Babu (FT) (Jt. supervision with Dr. Tripty Singh) September 2015 Colon Cancer Detection and Grading using Pathological images
Manju Venugopalan (FT) February 2016 Aspect level sentiment Analysis
Priyanka Nair (FT) February 2016 Healthcare Data Mining
Veena G. (PT) February 2016 Document similarity using Deep Leaning Techniques


Faculty Details


Faculty Email: