Publication Type : Conference Paper
Publisher : CEUR Workshop Proceedings, CEUR-WS.
Source : CEUR Workshop Proceedings, CEUR-WS, Volume 1737, p.126-130 (2016)
Keywords : Classification (of information), Distributional semantics, Document matrices, Factorization, Information Retrieval, Information retrieval systems, Matrix algebra, Nonnegative matrix factorization, Question classification, Semantic representation, Semantics, Test corpus, Test sets, Text classification, Text processing
Campus : Coimbatore
School : School of Engineering
Center : Computational Engineering and Networking
Department : Electronics and Communication
Year : 2016
Abstract : The objective of this experiment is to validate the performance of the distributional semantic representation of text in the classification (Question Classification) task and the Information Retrieval task. Followed by the distributional representation, first level classification of the questions is performed and relevant tweets with respect to the given queries are retrieved. The distributional representation of text is obtained by performing Non - Negative Matrix Factorization on top of the Document - Term Matrix in the training and test corpus. To improve the semantic representation of the text, phrases are also considered along with the words. This proposed approach achieved 80% as a F-1 measure and 0.0377 as a mean average precision against the its respective Mixed Script Information Retrieval task1 and task 2 test sets.
Cite this Research Publication : H. B. Barathi Ganesh, Dr. M. Anand Kumar, and Dr. Soman K. P., “Distributional semantic representation for text classification and information retrieval”, in CEUR Workshop Proceedings, 2016, vol. 1737, pp. 126-130.