Back close

SQL-MapReduce hybrid approach towards distributed projected clustering

Publication Type : Conference Paper

Publisher : International Conference on Data Science and Engineering, ICDSE 2014

Source : International Conference on Data Science and Engineering, ICDSE 2014, Institute of Electrical and Electronics Engineers Inc., p.18-23 (2014)

Url :

ISBN : 9781479968701

Keywords : cluster analysis, Cluster computing, Clustering algorithms, Clustering approach, Data mining, decision making, Distributed computer systems, Hadoop, High dimensional data, Hive, Map-reduce, Multiprocessing systems, Parallel processing systems, Pre-processing step, Projected clustering, Time and space complexity

Campus : Amritapuri

School : Department of Computer Science and Engineering, School of Engineering

Center : AI (Artificial Intelligence) and Distributed Systems

Department : Computer Science

Verified : Yes

Year : 2014

Abstract : Clustering high dimensional data is a major challenge in data mining due to the existence of inherent complexity and sparsity of the data. Projected clustering is one of the clustering approaches that determine the clusters in the subspaces of such high dimensional data. However, projected clustering within DBMS is quite computationally expensive in time and space complexity, when the volume of records is in terms of terabytes, petabytes and more. This expensive computation becomes a hurdle especially when the data clustering on transactional data is used as a preprocessing step for other tasks such as frequent decision making, efficient indexing, compression, etc. Hence, parallelizing and distributing expensive data clustering tasks becomes attractive in terms of speed-up of computation and the increased amount of memory available in a computing cluster. Inorder to achieve this, we propose a SQL-MapReduce hybrid approach for scalable projected clustering. © 2014 IEEE.

Cite this Research Publication : Sandhya Harikumar, Shyju, M., and Dr. Kaimal, M. R., “SQL-MapReduce hybrid approach towards distributed projected clustering”, in International Conference on Data Science and Engineering, ICDSE 2014, 2014, pp. 18-23

Admissions Apply Now