Publication Type:

Conference Paper

Source:

2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS), IEEE, Trivandrum, India (2013)

ISBN:

9781479921782

URL:

https://ieeexplore.ieee.org/document/6745450

Keywords:

Algorithm design and analysis, batch based algorithm, Big data, Big Data analysis, categorical dataset, Cholesky based algorithm, cholesky decomposition, Classification algorithm, clustering algorithm, compact data representation, Data structures, Dictionaries, Dictionary learning, dictionary learning algorithm, dictionary update stage, Encoding, generated sparse coded coefficient matrix, learning (artificial intelligence), Machine learning, Matching pursuit algorithms, Matrix algebra, Matrix decomposition, OMP, OMP algorithms, orthogonal matching pursuit algorithms, Pattern classification, pattern clustering, Singular value decomposition, Sparse coding, sparse coding stage, sparse data representation, Sparse matrices, Sparse representation, sparsity-based representation, SVD, TF-IDF weighting scheme, vector space model, Vectors

Abstract:

Over the past few decades, many algorithms have been continuously evolving in the area of machine learning. This is an era of big data which is generated by different applications related to various fields like medicine, the World Wide Web, E-learning networking etc. So, we are still in need for more efficient algorithms which are computationally cost effective and thereby producing faster results. Sparse representation of data is one giant leap toward the search for a solution for big data analysis. The focus of our paper is on algorithms for sparsity-based representation of categorical data. For this, we adopt a concept from the image and signal processing domain called dictionary learning. We have successfully implemented its sparse coding stage which gives the sparse representation of data using Orthogonal Matching Pursuit (OMP) algorithms (both Batch and Cholesky based) and its dictionary update stage using the Singular Value Decomposition (SVD). We have also used a preprocessing stage where we represent the categorical dataset using a vector space model based on the TF-IDF weighting scheme. Our paper demonstrates how input data can be decomposed and approximated as a linear combination of minimum number of elementary columns of a dictionary which so formed will be a compact representation of data. Classification or clustering algorithms can now be easily performed based on the generated sparse coded coefficient matrix or based on the dictionary. We also give a comparison of the dictionary learning algorithm when applying different OMP algorithms. The algorithms are analysed and results are demonstrated by synthetic tests and on real data.

Cite this Research Publication

Remya Rajesh, Nair, S. S., Srindhya, K., and Kaimal, M. R., “Sparsity-based Representation for Categorical Data”, in 2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS), Trivandrum, India, 2013.