It has always been a major challenge to cluster high dimensional data considering the inherent sparsity of data-points. Our model uses attribute selection and handles the sparse structure of the data effectively. The subset section is done by two different methods. In first method, we select the subset which has most informative attributes that do preserve cluster structure using LASSO (Least Absolute Selection and Shrinkage Operator). Though there are other methods for attribute selection, LASSO has distinctive properties that it selects the most correlated set of attributes of the data. In second method, we select the subset of linearly independent attributes using QR factorization. This model also identifies dominant attributes of each cluster which retain their predictive power as well. The quality of the projected clusters formed, is also assured with the use of LASSO.
Anoop S. Babu and Dr. M. R. Kaimal, “Projected Clustering with Subset Selection”, in IEEE Proceedings of the 3rd International Conference on Advances in Computing, Communications and Informatics (ICACCI-2014), Delhi, India, 2014, pp. 1452-1457.