Publication Type:

Conference Paper

Source:

2018 International Conference on Data Science and Engineering, ICDSE 2018, Institute of Electrical and Electronics Engineers Inc. (2018)

ISBN:

9781538648551

URL:

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85058325045&doi=10.1109%2fICDSE.2018.8527824&partnerID=40&md5=c9ce294f7b05db9d757d4a3103afecc0

Keywords:

Approximation theory, Conventional methods, Data integration, Data integration system, Document analysis, Important features, Information Retrieval, Low rank approximations, Matrix algebra, Matrix decomposition, Query processing, Research laboratories, Search engines, Semantic representation of documents, Semantics, Sparse matrices

Abstract:

This paper addresses an important problem of semantic representation of documents for information retrieval in a data integration system. Quite often search query on documents seek relevant information. Conventional methods of feature extraction do not capture relevance but rather focus on term matching for query processing. Challenges of semantic representation of documents lie in identification of important features. Most of the techniques for identifying important features, transform original data to a different space. This gives a sparse matrix which is computationally expensive. So we come up with an alternative approach based on CUR matrix decomposition. This technique finds important documents and important terms in order to improvise the query processing. Experimentation results prove the efficacy of this approach on five data sets. © 2018 IEEE.

Cite this Research Publication

C. Baladevi and Sandhya Harikumar, “Semantic Representation of Documents Based on Matrix Decomposition”, in 2018 International Conference on Data Science and Engineering, ICDSE 2018, 2018.