The area of machine learning has been witnessing tremendous improvements with new algorithms being developed to suit a particular domain, improving existing algorithms for computational cost benefits and also using the concepts of algorithms applicable in one domain to another. This is an era of big data generated from different resources like the web, medicine, E-learning, Networking etc. Sparse representation is a promising area for handling big data. Also, for reducing computational cost and simultaneously handling this colossal data requires data distribution, parallel algorithms or both. The focus of our paper is the application of sparse coding for the sparse representation of categorical data in a parallel environment using the GPGPU and comparing the results with the sequential iterative method for sparse coding and also in the Message Passing Interface (MPI) environment. Categorical data has been transformed to a vector space model. From among the different sparse coding algorithms like matching pursuit, basis pursuit, FOCUSS etc applicable mostly to the signal and image domain, we have applied parallelism to the computational steps of the Batch Orthogonal Matching Pursuit algorithm which generates separate sparse code for each instance of a large dataset over the same dictionary. The algorithm is analysed and it is found that it fairs 90% better under the GPU compared to sequential and 80% better compared to MPI environment. The results are demonstrated on synthetic and real data.
Remya Rajesh, K., A., and Kaimal, M. R., “Parallel Sparse coding for Categorical Data”, in Second International Conference on Emerging Research in Computing, Information, Communication and Applications(ERCICA), NMIT, Yelahanka, Bangalore, 2014.