Qualification: 
M.Tech, B-Tech
b_premjith@cb.amrita.edu

Premjith B. currently serves as a Faculty Associate at the Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore.  He is currently pursuing his Ph.D. in Natural Language Processing under the guidance of Dr. Soman K.P. He has 2.5 years of teaching experience before joining Amrita. His current research focus is on Natural Language Processing (NLP), Computational Linguistics and Deep Learning. He is interested in the development of NLP tools for linguistically rich Indian languages using deep learning algorithms. These tools can be effectively utilized in Neural Machine Translation, Indian language spoken dialogue system, Robotics, social media text analytics, etc.

Qualification

  • 2020 (Pursuing) : Ph. D. (Natural Language Processing),
    Amrita Vishwa Vidyapeetham
  • 2012 : M. Tech (Computational Engineering and Networking),
    Amrita Vishwa Vidyapeetham
  • 2009 : B. Tech (Computer Science and Engineering),
    Kannur University

Publications

Publication Type: Conference Paper

Year of Publication Title

2019

B. Premjith, Chandni Chandran V., Shriganesh Bhat, Dr. Soman K. P., and Prabaharan P., “A Machine Learning Approach for Identifying Compound Words from a Sanskrit Text”, in Proceedings of the 6th International Sanskrit Computational Linguistics Symposium, IIT Kharagpur, India, 2019.[Abstract]


In this paper, we propose a classification framework for finding the compound words
from a given Sanskrit text. The compound word identification plays a significant role in
learning the elucidations of verses in Ayurveda text books which are written in Sanskrit.
This process was modelled using several classification algorithms and we examined
their efficacy with varying word embedding dimensions. Sanskrit words were vectorized using fastText word embedding method. The results show that the performance of
K-Nearest Neighbor is better than other classifiers and the prediction accuracy is 90.38%.

More »»

2019

V. Prasad K., B. Premjith, Chandni Chandran V., Dr. Soman K. P., and Prabaharan Poornachandran, “Deep learning based Character-level approach for Morphological Inflection Generation”, in 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 2019.[Abstract]


In this paper, we present our work on morphological inflection generation of Sanskrit using a deep learning approach. Sanskrit is a morphologically rich language which came into use from the Vedic period. A basic understanding of the language formation is needed to study the abundant literature in it. Here a computational model for word formation in Sanskrit is proposed using deep learning based models. They are applied here to attain the morphological changes that a root word undergoes to result in the surface form. The approach is in character level so as to capture the character level transformations. The best performance was obtained from the Bidirectional Gated Recurrent Unit architecture with an accuracy of 98.42% and an F1-Score of 0.9838. This model is purely dependent on the dataset and does not require any external linguistic resources.

More »»

2019

A. Gopalakrishnan, Dr. Soman K. P., and B. Premjith, “Part-of-Speech Tagger for Biomedical Domain Using Deep Neural Network Architecture”, in 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kanpur, India, 2019.[Abstract]


POS tagging is the process of classifying words into their parts of speech like noun, verb, preposition etc. to a word. It is the most important and basic process in NLP. It is acts as an essential preprocess for other applications in natural language processing (NLP) like sentiment analysis, NER, speech recognition and so on. POS tagging is treated as a sequence labeling problem in which it labels words with their appropriate Part-Of-Speech. This work implementing a POS tagger for biomedical domain using deep neural network architecture. The experiment is RNN, LSTM, and GRU will give better performance since they are able to access more context information and which we evaluated using publicly accessible dataset from GENIA. Most of the applications in NLP became solved due to the advancement of neural network or deep learning.

More »»

2019

B. Premjith, “Amrita CEN CIQ: Classification of Insincere Questions”, in FIRE 2019 - Forum for Information Retrieval Evaluation, Kolkata, India, 2019.[Abstract]


This paper explains about the description of the task carried out by the team Amrita CEN CIQ: Classification of Insincere Questions for the shared task conducted by FIRE 2019.The main objective of the shared task taken is to classify the insincere questions into six fine grained classes-Rhetorical questions, Hate speech/ inflammatory questions, Hypothetical questions, Sexually explicit/objectionable content questions, Other and Sincere/ true Information Seeking questions. The proposed system predicts the test data with an accuracy of 48.51%. The classification model used in this task is the Decision Tree Classifier. The Word embedding algorithm used for the extraction of features is Fasttext algorithm. A balanced Decision Tree is used as a classifier and proved to get better results when compared to the Random Forest Classifier with 0.52 F1-score.

More »»

2019

B. Premjith, Dr. Soman K. P., and Sreelakshmi K., “Amrita CEN at HASOC 2019: Hate Speech Detection in Roman and Devanagiri Scripted Text”, in FIRE 2019 - Forum for Information Retrieval Evaluation, Kolkata, India, 2019.[Abstract]


Nowadays the usage of social media sites like Facebook and Twitter has increased rapidly which has lead to huge flooding of data in the social media sites. Though these social media sites give free opportunities to people to express and share their thoughts they also end up in spread of huge amount of hate content. In this paper we present a domain specific word embedding model for classification of English tweets to Non Hate-Offensive and Hate-Offensive and a fastText model for Hindi text classification. The classification is done using the dataset got from HASOC 2019 shared task. Deep learning algorithm is used as the classifier.

More »»

2019

K. K. Shahina, P. V. Jyothsna, G. Prabha, B. Premjith, and Dr. Soman K. P., “A Sequential Labelling Approach for the Named Entity Recognition in Arabic Language Using Deep Learning Algorithms”, in 2019 International Conference on Data Science and Communication (IconDSC), Bangalore, India, 2019.[Abstract]


Named Entity Recognition (NER) involves finding and categorizing minute text components into pre- defined categories such as name of person, location etc. NER is a type of information extraction task which has a crucial role in improving the performance of various NLP applications. For a morphologically abundant Semitic language like Arabic, the NER task is highly challenging due to its unique morphological characteristics and peculiarities. This paper introduces a deep learning based approach for Arabic NER which make use of well-known deep neural network (DNN) architectures like Recurrent neural network (RNN), Long short term memory (LSTM), Gated recurrent unit (GRU), stacked and bidirectional versions of these three architectures. ANERcorp dataset is used for the evaluation of the Arabic NER model and Accuracy is chosen as the performance metric. On model evaluation, it is observed that bidirectional variants of DNNs provide better accuracy measures compared to their unidirectional variants.

More »»

2019

B. Premjith, Dr. Soman K. P., and Prabaharan Poornachandran, “Amrita_CEN@FACT: Factuality Identification in Spanish Text”, in IberLEF@SEPLN, 2019.[Abstract]


This paper presents the description of the system used by the team Amrita CEN for the shared task on FACT (Factuality Analysis and Classification Task) at IberLEF2019 (Iberian Languages Evaluation Forum) workshop. The goal of the task was to automatically annotate an event with its factuality status. Factuality status is categorized into three as Fact, Counter Fact and Undefined. Our proposed system predicts the factuality of an event with a prediction accuracy of 72.1%. The classification model for this task was trained using Random Forest classifier which uses word embedding of the events as input features. The word embedding of an event was generated by using Word2vec algorithm. Random Forest was implemented by giving higher weights to minority classes and lesser weights to majority classes so that more number of elements in the minority class will be predicted precisely

More »»

2018

B. Premjith, Dr. Soman K. P., and Prabaharan Poornachandran, “A deep learning based Part-of-Speech (POS) tagger for Sanskrit language by embedding character level features”, in FIRE'18, 2018.[Abstract]


Part-of-Speech (POS) tagging is an important task in Natural Language Processing and numerous taggers have been developed for POS tagging in several languages. In Sanskrit also, one of the oldest languages in the world, many POS taggers were developed. However, less attention was given to the machine learning based POS tagging. In this paper, various deep learning algorithms are used for implementing a POS tagger for Sanskrit. This problem is framed as a sequence labeling problem at the character level. Therefore, a word to be POS tagged is considered as a sequence of characters and the sequential relationship among the characters in a word is captured with the deep learning algorithms such as Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) networks, Gate Recurrent Unit (GRU) and their bidirectional versions. The character level formulation of the problem reduces the memory requirement compared to the word level implementations and also increases the accuracy of labeling. The performance of the labeling task was analyzed with the different combinations of hyper-parameters. We obtained the accuracy score of 97.86% with Bidirectional GRU. The character level implementations of both uni and bidirectional forms of RNN, LSTM and GRU outperformed all world level implementations in terms of accuracy, number of trainable parameters and the storage requirement.

More »»

2018

G. Prabha, P. V. Jyothsna, Shahina, K. K., B. Premjith, and Dr. Soman K. P., “A Deep Learning Approach for Part-of-Speech Tagging in Nepali Language”, in 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, 2018.[Abstract]


Part of Speech (POS) tagging is the most fundamental task in various natural language processing(NLP) applications such as speech recognition, information extraction and retrieval and so on. POS tagging involves annotation of appropriate tag for each token in the corpus based on its context and the syntax of the language. In computational linguistics, optimal POS tagger is of paramount importance since tagging errors can critically affect the performance of the complex NLP systems. Developing an efficient POS tagger for morphologically rich languages like Nepali is a challenging task. In this paper, a deep learning based POS tagger for Nepali text is proposed which is built using Recurrent Neural Network (RNN), Long Short-Term Memory Networks (LSTM), Gated Recurrent Unit (GRU) and their bidirectional variants. Performance metrics such as accuracy, precision, recall and F1-score were chosen for the model evaluation. It is observed from the results that our model shows significant improvement and outperforms the state-of-art POS taggers with more than 99% accuracy.

More »»

2018

B. Premjith, Dr. Soman K. P., and Anand Kumar M., “A deep learning approach for Malayalam morphological analysis at character level”, in Procedia Computer Science, 2018, vol. 132, pp. 47-54.[Abstract]


Morphological analysis is one of the fundamental tasks in computational processing of natural languages. It is the study of the rules of word construction by analysing the syntactic properties and morphological information. In order to perform this task, morphemes have to be separated from the original word. This process is termed as sandhi splitting. Sandhi splitting is important in the morphological analysis of agglutinative languages like Malayalam, because of the richness in morphology, inflections and sandhi. Due to sandhi, many morphological changes occur at the conjoining position of morphemes. Therefore, determining the morpheme boundaries becomes a tough task, especially in languages like Malayalam. In this paper, we propose a deep learning approach for learning the rules for identifying the morphemes automatically and segmenting them from the original word. Then, individual morphemes can be further analysed to identify the grammatical structure of the word. Three different systems were developed for this analysis using Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) and obtained accuracies 98.08%, 97.88% and 98.16% respectively. © 2018 The Authors. Published by Elsevier Ltd.

More »»

2017

Vinayakumar R, Sachin Kumar S, B. Premjith, Prabaharan Poornachandran, and Kp, S., “DEFT 2017 - Texts Search @ TALN / RECITAL 2017: Deep Analysis of Opinion and Figurative language on Tweets in French”, in DEFT 2017 Shared task "Défi Fouille de Textes"@TALN/RECITAL 2017, France, 2017.[Abstract]


The working note discusses the description of our language independent system submitted to the DEFT 2017 three shared tasks on Opinion analysis and figurative language in twitter tweets in French. We use embedding of bag-of-words method with a family of recurrent neural networks to analysis of tweets occurred around on the analysis of opinion and figurative language. We developed three systems for each shared task and each system focuses on Opinion analysis and figurative language substantially at the tweets level only. A family of recurrent neural network extracts features in each tweet and classified them using logistic regression. On task1, our system achieved Macro fscore of 0.276, 0.228, and 0.21 with long short-term memory (LSTM) for extracting features from tweets and logistic regression for classification. On task2 our system achieved Macro f-score 0.475, 0.470, 0.476 with recurrent neural network (RNN) for extracting features from tweets and logistic regression for classification. And on task3 our system achieved Macro f-score 0.22, 0.232, 0.231 with gated recurrent unit (GRU) for extracting features from tweets and logistic regression for classification. Apart from results, this working note give valuable deep insights in to applicability of deep learning mechanisms for Sentimental analysis (SA) or Opinion mining (OM). Moreover the proposed method typically serves as a language independent method.

More »»

2017

Vinayakumar R, B. Premjith, Sachin Kumar S., Dr. Soman K. P., and Prabaharan Poornachandran, “deepCybErNet at EmoInt-2017: Deep Emotion Intensities in Tweets”, in Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Copenhagen, Denmark, 2017.[Abstract]


This working note presents the methodology used in deepCybErNet submission to the shared task on Emotion Intensities in Tweets (EmoInt) WASSA-2017. The goal of the task is to predict a real valued score in the range [0-1] for a particular tweet with an emotion type. To do this, we used Bag-of-Words and embedding based on recurrent network architecture. We have developed two systems and experiments are conducted on the Emotion Intensity shared Task 1 data base at WASSA-2017. A system which uses word embedding based on recurrent network architecture has achieved highest 5 fold cross-validation accuracy. This has used embedding with recurrent network to extract optimal features at tweet level and logistic regression for prediction. These methods are highly language independent and experimental results shows that the proposed methods are apt for predicting a real valued score in than range [0-1] for a given tweet with its emotion type.

More »»

2017

Vinayakumar R, Sachin Kumar S, B. Premjith, Prabaharan Poornachandran, and Dr. Soman K. P., “Deep Stance and Gender Detection in Tweets on Catalan Independence@Ibereval 2017”, in IberEval 2017 Evaluation of Human Language Technologies for Iberian Languages Workshop 2017,, Murcia, Spain, 2017.[Abstract]


This paper discusses deepyCybErNet submission methodology to the task on Stance and Gender Detection in Tweets on Catalan Independence@Ibereval 2017. The goal of the task is to detect the stance and gender of the user in tweets on the subject ”independence of Catalonia”. Tweets are available in two languages: Spanish and Catalan. In task 1 and 2, the system has to determine whether the tweet is in favor of, against or neutral to the tweets on the subject pertaining to the task in Spanish and Catalan languages respectively. In task 3 and 4, the system has to decide whether the person who tweets is a male or female. We submitted three systems for this task a Bag-of-Words (BOW) representation for tweets with logistic regression classifier, Recurrent Neural Network (RNN) based approach, Long Short Term Memory (LSTM) based approach and gated recurrent based approach. These methods are highly language independent and can be used for the declarations of stance of tweets and identifying the gender of twitter user in any language. These methods have performed better in detecting stance and gender in tweets of Catalan language than in those of Spanish.

More »»

2011

B. Premjith, “A Level Set Methodology for Sanskrit Document Binarization and Character Segmentation”, in Second International Conference on Control, Communication and Computer Technology, 2011.

Publication Type: Journal Article

Year of Publication Title

2019

B. Premjith, Anand Kumar M., and Dr. Soman K. P., “Neural Machine Translation System for English to Indian Language Translation Using MTIL Parallel Corpus: Special Issue on Natural Language Processing”, Journal of Intelligent Systems, 2019.[Abstract]


Introduction of deep neural networks to the machine translation research ameliorated conventional machine translation systems in multiple ways, specifically in terms of translation quality. The ability of deep neural networks to learn a sensible representation of words is one of the major reasons for this improvement. Despite machine translation using deep neural architecture is showing state-of-the-art results in translating European languages, we cannot directly apply these algorithms in Indian languages mainly because of two reasons: unavailability of the good corpus and Indian languages are morphologically rich. In this paper, we propose a neural machine translation (NMT) system for four language pairs: English-Malayalam, English-Hindi, English-Tamil, and English-Punjabi. We also collected sentences from different sources and cleaned them to make four parallel corpora for each of the language pairs, and then used them to model the translation system. The encoder network in the NMT architecture was designed with long short-term memory (LSTM) networks and bi-directional recurrent neural networks (Bi-RNN). Evaluation of the obtained models was performed both automatically and manually. For automatic evaluation, the bilingual evaluation understudy (BLEU) score was used, and for manual evaluation, three metrics such as adequacy, fluency, and overall ranking were used. Analysis of the results showed the presence of lengthy sentences in English-Malayalam, and the English-Hindi corpus affected the translation. Attention mechanism was employed with a view to addressing the problem of translating lengthy sentences (sentences contain more than 50 words), and the system was able to perceive long-term contexts in the sentences. ©2019 Walter de Gruyter GmbH, Berlin/Boston 2019.

More »»

2019

B. Premjith, Dr. Soman K. P., Anand Kumar M., and Jyothi Ratnam D., “Embedding linguistic features in word embedding for preposition sense disambiguation in english—Malayalam machine translation context”, Studies in Computational Intelligence, vol. 823, pp. 341-370, 2019.[Abstract]


Preposition sense disambiguation has huge significance in Natural language processing tasks such as Machine Translation. Transferring the various senses of a simple preposition in source language to a set of senses in target language has high complexity due to these many-to-many relationships, particularly in English-Malayalam machine translation. In order to reduce this complexity in the transfer of senses, in this paper, we used linguistic information such as noun class features and verb class features of the respective noun and verb correlated to the target simple preposition. The effect of these linguistic features for the proper classification of the senses (postposition in Malayalam) is studied with the help of several machine learning algorithms. The study showed that, the classification accuracy is higher when both verb and noun class features are taken into consideration. In linguistics, the major factor that decides the sense of the preposition is the noun in the prepositional phrase. The same trend was observed in the study when the training data contained only noun class features. i.e., noun class features dominates the verb class features. © Springer Nature Switzerland AG 2019.

More »»

2019

Athira Gopalakrishnan, Dr. Soman K. P., and B. Premjith, “A Deep Learning-Based Named Entity Recognition in Biomedical Domain”, Lecture Notes in Electrical Engineering, vol. 545, pp. 517-526, 2019.[Abstract]


In the biomedical field, huge amounts of data have been produced day by day. These data drives the development of the biomedical area researches in so many ways. This paper mainly focusing on biomedical named entity recognition (NER) with the aim to enhance the performance through deep learning. Impressive results in natural language processing are made possible by deep learning techniques. Deep learning enables us to use them for NLP tasks and producing huge differences in accuracy compared to traditional methods. NER is a crucial initial step in information extraction in the biomedical domain. Here we use RNN, LSTM, and GRU on GENIA version 3.02 corpus and achieves an F score of 90%, which is better than the most state-of-the-art systems. © 2019, Springer Nature Singapore Pte Ltd.

More »»

2018

J. D. Ratnam, Soman, K. P., B. Premjith, and Dr. M. G. Priya, “Transfer of Simple English Prepositions ‘to’ and ‘with’ Into Hindi Utilizing Linguistic Features of the Predicative Part of a Sentence with Machine Learning Approach in an English to Hindi MT Context”, Journal of Advanced Research in Dynamical & Control Systems, vol. 10, 2018.

2018

Dr. M. Anand Kumar, B. Premjith, Shivkaran Singh, Sankaravelayuthan, R., and Dr. Soman K. P., “An Overview of the Shared Task on Machine Translation in Indian Languages (MTIL) - 2017”, Journal of Intelligent Systems, 2018.[Abstract]


In recent years, the multilingual content over the internet has grown exponentially together with the evolution of the internet. The usage of multilingual content is excluded from the regional language users because of the language barrier. So, machine translation between languages is the only possible solution to make these contents available for regional language users. Machine translation is the process of translating a text from one language to another. The machine translation system has been investigated well already in English and other European languages. However, it is still a nascent stage for Indian languages. This paper presents an overview of the Machine Translation in Indian Languages shared task conducted on September 7-8, 2017, at Amrita Vishwa Vidyapeetham, Coimbatore, India. This machine translation shared task in Indian languages is mainly focused on the development of English-Tamil, English-Hindi, English-Malayalam and English-Punjabi language pairs. This shared task aims at the following objectives: (a) to examine the state-of-the-art machine translation systems when translating from English to Indian languages; (b) to investigate the challenges faced in translating between English to Indian languages; (c) to create an open-source parallel corpus for Indian languages, which is lacking. Evaluating machine translation output is another challenging task especially for Indian languages. In this shared task, we have evaluated the participant's outputs with the help of human annotators. As far as we know, this is the first shared task which depends completely on the human evaluation. ©2018 Walter de Gruyter GmbH, Berlin/Boston 2018.

More »»

2018

Aravind J. Prakash, Dhanya Sathyan, Dr. Anand K. B., and B. Premjith, “Modeling the Fresh and Hardened Stage Properties of Self-Compacting Concrete using Random Kitchen Sink Algorithm”, International Journal of Concrete Structures and Materials, vol. 12, 2018.[Abstract]


High performance concrete especially self compacting concrete (SCC) has got wide popularity in construction industry because of its ability to flow through congested reinforcement without segregation and bleeding. Even though European Federation of National Associations Representing for Concrete (EFNARC) guidelines are available for the mix design of SCC, large number of trials are required for obtaining an SCC mix with the desired engineering properties. The material and time requirement is more to conduct such large number of trials. The main objective of the study presented in this paper is to demonstrate use of regularized least square algorithm (RLS) along with random kitchen sink algorithm (RKS) to effectively predict the fresh and hardened stage properties of SCC. The database for testing and training the algorithm was prepared by conducting tests on 40 SCC mixes. Parametric variation in the SCC mixes were the quantities of fine and coarse aggregates, superplasticizer dosage, its family and water content. Out of 40 test results, 32 results were used for training and 8 set results were used for testing the algorithm. Modelling of both fresh state properties viz., flowing ability (Slump Flow), passing ability (J Ring), segregation resistance (V funnel at 5 min) as well as hardened stage property (compressive strength) of the SCC mix was carried out using RLS and RKS algorithm. Accuracy of the model was checked by comparing the predicted and measured values. The model could accurately predict the properties of the SCC within the experimental domain. © 2018, The Author(s).

More »»

2017

Aravind J. Prakash, Dhanya Sathyan, K. B. Anand, and B. Premjith, “Prediction of rheological properties of self compacting concrete: Regularized least square approach”, International Journal of Earth Sciences and Engineering, 2017.

2016

B. Premjith, S. Sachin Kumar, R. Shyam, Dr. M. Anand Kumar, and Dr. Soman K. P., “A Fast and Efficient Framework for Creating Parallel Corpus”, Indian Journal of Science and Technology, vol. 9, 2016.[Abstract]


A framework involving Scansnap SV600 scanner and Google Optical character recognition (OCR) for creating parallel corpus which is a very essential component of Statistical Machine Translation (SMT). Methods and Analysis: Training a language model for a SMT system highly depends on the availability of a parallel corpus. An efficacious approach for collecting parallel sentences is the predominant step in an MT system. However, the creation of a parallel corpus requires extensive knowledge in both languages which is a time consuming process. Due to these limitations, making the documents digital becomes very difficult and which in turn affects the quality of machine translation systems. In this paper, we propose a faster and efficient way of generating English to Indian languages parallel corpus with less human involvement. With the help of a special type of scanner called Scansnap SV600 and Google OCR and a little linguistic knowledge, we can create a parallel corpus for any language pair, provided there should be paper documents with parallel sentences. Findings: It was possible to generate 40 parallel sentences in 1 hour time with this approach. Sophisticated morphological tools were used for changing the morphology of the text generated and thereby increase the size of the corpus. An additional benefit of this is to make ancient scriptures or other manuscripts in digital format which can then be referred by the coming generation to keep up the traditions of a nation or a society. Novelty: Time required for creating parallel corpus is reduced by incorporating Google OCR and book scanner.

More »»

2016

P. Poornachandran, B. Premjith, and Dr. Soman K. P., “A distributed approach for predicting malicious activities in a network from a streaming data with support vector machine and explicit random feature mapping”, IIOAB Journal, vol. 7, pp. 24-29, 2016.[Abstract]


Technology reduces human effort. However technological advancements always bring threat to personal as well as organizational security, mainly because we all are connected to the internet. Therefore, ensuring cyber security becomes the major topic of discussion. As the magnitude of activities over the internet is unimaginable, envisioning the characteristics of network activities whether it is malicious or good, coming from a stream of data in real time is really a tough task. To tackle this problem, in this paper, we propose a distributive approach based on Support Vector Machine (SVM) with explicit random feature mapping and features mapping is obtained using Compact random feature maps (CRAFTMaps) algorithm. Distributing the job achieves notable improvement in the total prediction time. © 2016, Institute of Integrative Omics and Applied Biotechnology. All rights reserved.

More »»

2015

B. Premjith, Neethu Mohan, Prabaharan Poornachandran, and Dr. Soman K. P., “Audio Data Authentication with PMU Data and EWT”, Procedia Technology, vol. 21, pp. 596 - 603, 2015.[Abstract]


Digital forensics has become a flourishing research area. Electrical Network Frequency (ENF) plays an important role in assessing the authenticity of a digital recording such as audio. ENF criterion is a tool for extracting the embedded power line frequency from the recording. A cross correlation between a reference PMU data and extracted ENF signal can be done in order to determine the authenticity of an audio signal. In this paper, Empirical Wavelet Transform (EWT) is used for extracting the ENF from an audio signal. EWT decomposes signal into N modes. Hilbert Transform is used to compute the instantaneous frequency and amplitude of the extracted mode corresponding to ENF. EWT method is not able to capture the weak harmonics in a signal. This problem is resolved by fixing the frequency domain boundaries of each mode.

More »»

2015

Sachin Kumar S., B. Premjith, Anand Kumar M., and Dr. Soman K. P., “AMRITA_CEN-NLP@SAIL2015: Sentiment analysis in indian language using regularized least square approach with randomized feature learning”, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9468, pp. 671-683, 2015.[Abstract]


The present work is done as part of shared task in Sentiment Analysis in Indian Languages (SAIL 2015), under constrained category. The task is to classify the twitter data into three polarity categories such as positive, negative and neutral. For training, twitter dataset under three languages were provided Hindi, Bengali and Tamil. In this shared task, ours is the only team who participated in all the three languages. Each dataset contained three separate categories of twitter data namely positive, negative and neutral. The proposed method used binary features, statistical features generated from SentiWordNet, and word presence (binary feature). Due to the sparse nature of the generated features, the input features were mapped to a random Fourier feature space to get a separation and performed a linear classification using regularized least square method. The proposed method identified more negative tweets in the test data provided Hindi and Bengali language. In test tweet for Tamil language, positive tweets were identified more than other two polarity categories. Due to the lack of language specific features and sentiment oriented features, the tweets under neutral were less identified and also caused misclassifications in all the three polarity categories. This motivates to take forward our research in this area with the proposed method. © Springer International Publishing Switzerland 2015. More »»

2015

K. R. Rithu Vadhana, G. Swamynadhan, P. V. Neethu, B. Premjith, and Dr. Soman K. P., “Computational experiment of one class SVM in excel”, International Journal of Applied Engineering Research, vol. 10, no. 20, pp. 19356-19360, 2015.[Abstract]


Computational thinking is a strategic thought process that has led to spectacular achievements which ignited a technological boom across domains. Classification is one of the major task in machine learning. Support Vector Machine is one of the classification method in machine learning. One class SVM is a method for identifying outliers from a data set. In one class SVM, only target class information is taken into consideration and outliers information are not taken. Using this we minimize the chance of accepting outliers by optimizing the radius of hypersphere. This is a robust method against outliers. In this paper, the experiment of one class SVM on simulated data points is implemented in Excel. Excel is a powerful and easy tool that gives an opportunity for better understanding and ease of learning. It is the only platform which requires very less system requirements and programming skills. © Research India Publications. More »»

2013

B. Premjith, S. Sachin Kumar, Akhil Manikkoth, T. V. Bijeesh, and Dr. Soman K. P., “Insight into Primal Augmented Lagrangian Multilplier Method”, Numerical Analysis, 2013.[Abstract]


We provide a simplified form of Primal Augmented Lagrange Multiplier algorithm. We intend to fill the gap in the steps involved in the mathematical derivations of the algorithm so that an insight into the algorithm is made. The experiment is focused to show the reconstruction done using this algorithm.

More »»

Publication Type: Book

Year of Publication Title

2018

D. Jyothi Ratnam, Dr. M. Anand Kumar, B. Premjith, Dr. Soman K. P., and S. Rajendran, Sense disambiguation of English simple prepositions in the context of English-Hindi machine translation system. Springer Singapore, 2018, pp. 245-268.[Abstract]


In the context of developing a Machine Translation System, the identification of the correct sense of each and every word in the document to be translated is extremely important. Adpositons play a vital role in the determination of the sense of a particular word in a sentence as they link NPs with the VPs. In the context of developing English to Hindi Machine Translation system, the transfer of the senses of each Preposition into the target langue needs done with much attention. The linguistic and grammatical role of a preposition is to express a variety of syntactic and semantic relationships between nouns, verbs, adjectives, and adverbs. Here we have selected the most important and most frequently used English simple prepositions such as ‘at’, ‘by’, ‘from’, ‘for’, ‘in’, ‘of’, ‘on’, ‘to’ and ‘with’ for the sake of contrast. A supervised machine learning approach called Support Vector Machine (SVM) is used for disambiguating the senses of the simple preposition ‘at’ in contrast with Hindi postpositions. © Springer Nature Singapore Pte Ltd. 2018.

More »»