Publication Type:

Conference Paper


International Conference on Data Science and Engineering, ICDSE 2014, Institute of Electrical and Electronics Engineers Inc., p.18-23 (2014)





cluster analysis, Cluster computing, Clustering algorithms, Clustering approach, Data mining, decision making, Distributed computer systems, Hadoop, High dimensional data, Hive, Map-reduce, Multiprocessing systems, Parallel processing systems, Pre-processing step, Projected clustering, Time and space complexity


Clustering high dimensional data is a major challenge in data mining due to the existence of inherent complexity and sparsity of the data. Projected clustering is one of the clustering approaches that determine the clusters in the subspaces of such high dimensional data. However, projected clustering within DBMS is quite computationally expensive in time and space complexity, when the volume of records is in terms of terabytes, petabytes and more. This expensive computation becomes a hurdle especially when the data clustering on transactional data is used as a preprocessing step for other tasks such as frequent decision making, efficient indexing, compression, etc. Hence, parallelizing and distributing expensive data clustering tasks becomes attractive in terms of speed-up of computation and the increased amount of memory available in a computing cluster. Inorder to achieve this, we propose a SQL-MapReduce hybrid approach for scalable projected clustering. © 2014 IEEE.


cited By 0; Conference of 2014 International Conference on Data Science and Engineering, ICDSE 2014 ; Conference Date: 26 August 2014 Through 28 August 2014; Conference Code:112595

Cite this Research Publication

Sandhya Harikumar, Shyju, M., and Dr. M. R. Kaimal, “SQL-MapReduce hybrid approach towards distributed projected clustering”, in International Conference on Data Science and Engineering, ICDSE 2014, 2014, pp. 18-23.