Semantics-based topic inter-relationship extraction

Publication Type : Journal Article

Publisher : Journal of Intelligent and Fuzzy Systems, IOS Press

Source : Journal of Intelligent and Fuzzy Systems, IOS Press, Volume 32, Number 4, p.2941-2951 (2017)

Url : https://www.scopus.com/inward/record.uri?eid=2-s2.0-85016823154&doi=10.3233%2fJIFS-169237&partnerID=40&md5=fbda7b19062ac1979c45c6ea5980c4bb

Keywords : Collection of documents, Data mining, Document collection, Functional analysis, Intelligent systems, Inter-relationships, Latent dirichlet allocations, Latent Semantic Analysis, Matrix algebra, Probabilistic modeling, Reduced description, Semantics, Singular value decomposition, Soft computing, Statistics, Vector space models, Vector spaces

Campus : Amritapuri

School : School of Computing

Department : Computer Science

Year : 2017

Abstract : Maintaining large collection of documents is an important problem in many areas of science and industry. Different analysis can be performed on large document collection with ease only if a short or reduced description can be obtained. Topic modeling offers a promising solution for this. Topic modeling is a method that learns about hidden themes from a large set of unorganized documents. Different approaches and alternatives are available for finding topics, such as Latent Dirichlet Allocation (LDA), neural networks, Latent Semantic Analysis (LSA), probabilistic LSA (pLSA), probabilistic LDA (pLDA). In topic models the topics inferred are based only on observing the term occurrence. However, the terms may not be semantically related in a manner that is relevant to the topic. Understanding the semantics can yield improved topics for representing the documents. The objective of this paper is to develop a semantically oriented probabilistic model based approach for generating topic representation from the document collection. From the modified topic model, we generate 2 matrices-a document-topic and a term-topic matrix. The reduced document-term matrix derived from these two matrices has 85 similarity with the original document-term matrix i.e. we get 85 similarity between the original document collection and the documents reconstructed from the above two matrices. Also, a classifier when applied to the document-topic matrix appended with the class label, shows an 80 improvement in F-measure score. The paper also uses the perplexity metric to find out the number of topics for a test set. © 2017-IOS Press and the authors. All rights reserved.

Cite this Research Publication :
Remya Rajesh, Joseph, D., and Dr. Kaimal, M. R., “Semantics-based topic inter-relationship extraction”, Journal of Intelligent and Fuzzy Systems, vol. 32, pp. 2941-2951, 2017

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Haridwar

Research

Centers

Patents

Publication