This paper addresses an important problem of semantic representation of documents for information retrieval in a data integration system. Quite often search query on documents seek relevant information. Conventional methods of feature extraction do not capture relevance but rather focus on term matching for query processing. Challenges of semantic representation of documents lie in identification of important features. Most of the techniques for identifying important features, transform original data to a different space. This gives a sparse matrix which is computationally expensive. So we come up with an alternative approach based on CUR matrix decomposition. This technique finds important documents and important terms in order to improvise the query processing. Experimentation results prove the efficacy of this approach on five data sets. © 2018 IEEE.
C. Baladevi and Sandhya Harikumar, “Semantic Representation of Documents Based on Matrix Decomposition”, in 2018 International Conference on Data Science and Engineering, ICDSE 2018, 2018.