Qualification: 
M.Tech
sandhyaharikumar@am.amrita.edu

Sandhya Harikumar currently serves as the Assistant Professor at the Department of Computer Science Engineering at Amrita School of Engineering, Amritapuri. She has completed M. Tech. in Computer Science.

Publications

Publication Type: Conference Paper

Year of Publication Publication Type Title

2017

Conference Paper

J. Isaac and S.a Harikumar, “Logistic regression within DBMS”, in Proceedings of the 2016 2nd International Conference on Contemporary Computing and Informatics, IC3I 2016, 2017, pp. 661-666.[Abstract]


The context of this paper is to come up with an analytical query model for data categorization within DBMS. DBMS being the asset for most of the organizations, classification can help in getting better insight and control over the data. Conventionally, classification algorithms like logistic regression, KNN, etc. are applied after exporting the data out of DBMS, using non DBMS tools like R, matrix packages, generic data mining programs or large scale systems like Hadoop and Spark. However, this leads to I/O overhead since the data within DBMS is updated quite frequently and usually cannot be accommodated in the main memory. This paper proposes an alternative strategy, based on SQL and UDFs, to integrate the logistic regression for data categorization as well as prediction query processing within DBMS. A comparison of SQL with user defined functions (UDFs) as well as with statistical packages like R is presented, by experimentation on real datasets. The empirical results show the viability and validity of this approach for predicting the class of a given query. © 2016 IEEE.

More »»

2016

Conference Paper

S.a Harikumar and Thaha, S. S., “MapReduce model for k-medoid clustering”, in Proceedings of the 2016 International Conference on Data Science and Engineering, ICDSE 2016, 2016.[Abstract]


Distributed and Parallel computing are best alternatives for scalable clustering of huge amount of data with moderate to high dimensions, together with improved speed up. In this paper we address the problem of k-medoid clustering using MapReduce framework for distributed computing on commodity machines to evaluate its efficacy. There are mainly two issues to be tackled. The first one is, how to distribute the data for efficient clustering and the second one is, how to minimize the I/O and network cost among the machines. So, the main contributions of this paper are : (a)A map reduce methodology for distributed k-medoid clustering; (b) Reduction in the overall execution time and the overhead of data movement from one site to another leading to sub linear scaleup and speedup. This approach proves to be efficient, as the local clustering can be carried out independently from each other. Experimental analysis on millions of data using just 10 cores in parallel shows the clustering of data of size 1M × 17 requires only 4 minutes. So, such low transmission cost and low bandwidth requirement leads to improved speedup and scaleup of the distributed data. © 2016 IEEE.

More »»

2016

Conference Paper

S.a Harikumar and Dilipkumar, D. U., “Apriori algorithm for association rule mining in high dimensional data”, in Proceedings of the 2016 International Conference on Data Science and Engineering, ICDSE 2016, 2016.[Abstract]


Apriori is one of the best algorithms for learning association rules. Due to the explosion of data, the storage and retrieval mechanisms in various database paradigms have revolutionized the technologies and methodologies used in the architecture. As a result, the database is not only utilized for mere information retrieval but also to infer the analytical aspect of data. Therefore it is essential to find association rules from high dimensional data because the correlation amongst the attributes can help in gaining deeper insight into the data and help in decision making, recommendations as well as reorganizing the data for effective retrieval. The traditional Apriori algorithm is computationally expensive and infeasible with high dimensional datasets. Hence we propose a variant of Apriori algorithm using the concept of QR decomposition for reducing the dimensions thereby reducing the complexity of the traditional Apriori algorithm. © 2016 IEEE.

More »»

2015

Conference Paper

S.a Harikumar and Raji Ramachandran, “Hybridized fragmentation of very large databases using clustering”, in 2015 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems, SPICES 2015, 2015.[Abstract]


Due to the ever growing needs of managing huge volume of data, together with the desire for consistent, scalable, reliable and efficient retrieval of information, an intelligent mechanism to design the storage structure for distributing the databases has become inevitable. The two critical facets of distributed databases are data fragmentation and allocation. Existing fragmentation techniques are based on the frequency and type of the queries as well as the statistics of the empirical data. However, very limited work is done to fragment the data based on the pattern of the tuples and the attributes responsible for such patterns. This paper presents a unique approach towards hybridized fragmentation, by applying subspace clustering algorithm, to come up with a set of fragments which partitions the data with respect to tuples as well as attributes. Projected clustering is the one that determines the clusters in the subspaces of high dimensional data. This concept leads to find the closely correlated attributes for different sets of instances thereby giving good hybridized fragments for distributed databases. Experimental results show that fragmenting the database based on clustering, results in reduced database access time as compared to the fragments chosen at design time using certain statistics. © 2015 IEEE.

More »»

2014

Conference Paper

S.a Harikumar, Reethima, R., and Dr. Kaimal, M. R., “Semantic integration of heterogeneous relational schemas using multiple L1 linear regression and SVD”, in International Conference on Data Science and Engineering, ICDSE 2014, 2014, pp. 105-111.[Abstract]


The challenge of semantic integration of heterogeneous databases is one of the critical areas of interest due to scalability of data and the need to share the existing data as the technology advances. The schema level heterogeneity of the relations is the major issue for such integration. Though various approaches of schema analysis, transformation and integration have been explored, sometimes those become too general to solve the problem especially when the data is very high-dimensional and the schema information is unavailable or inadequate. In this paper, a method to integrate heterogeneous relational schema at instance-level is proposed, rather than the schema level. A global schema is designed consisting of the integration of most relevant attributes of different relational schema of a particular domain. In order to find the significant attributes, multiple linear regressions based on LI norm and Singular Value Decomposition(SVD) is applied on the data iteratively. This is a variant of L1-PCA, which is efficient, effective and meaningful method of linear subspace estimation. The most prominent instance - level similarity is found by finding the most significant attributes of each relational data source and then finding the similarity among those attributes using L1-norm. Thus an integrated schema is created that maps the relevant attributes of each local schema to a global schema. © 2014 IEEE.

More »»

2014

Conference Paper

S.a Harikumar, Shyju, M., and Dr. Kaimal, M. R., “SQL-MapReduce hybrid approach towards distributed projected clustering”, in International Conference on Data Science and Engineering, ICDSE 2014, 2014, pp. 18-23.[Abstract]


Clustering high dimensional data is a major challenge in data mining due to the existence of inherent complexity and sparsity of the data. Projected clustering is one of the clustering approaches that determine the clusters in the subspaces of such high dimensional data. However, projected clustering within DBMS is quite computationally expensive in time and space complexity, when the volume of records is in terms of terabytes, petabytes and more. This expensive computation becomes a hurdle especially when the data clustering on transactional data is used as a preprocessing step for other tasks such as frequent decision making, efficient indexing, compression, etc. Hence, parallelizing and distributing expensive data clustering tasks becomes attractive in terms of speed-up of computation and the increased amount of memory available in a computing cluster. Inorder to achieve this, we propose a SQL-MapReduce hybrid approach for scalable projected clustering. © 2014 IEEE.

More »»

Faculty Research Interest: 
207
PROGRAMS
OFFERED
6
AMRITA
CAMPUSES
15
CONSTITUENT
SCHOOLS
A
GRADE BY
NAAC, MHRD
8th
RANK(INDIA):
NIRF 2018
150+
INTERNATIONAL
PARTNERS