The rapid growth of web users searching for their topics of interest on web, pose challenges to the system, in particular to the search engines. Web content summarization is one crucial application which helps in leveraging the performance of search engines. However summarizing the totality of web content is a laborious task due to the massiveness of web data. Segmenting the web into communities and extracting only relevant pages from those communities for summarization could be a viable solution. This paper presents a novel technique for web summarization by extracting pages of the web according to their degree of authenticity. For this, a large collection of pages are crawled from the web and the communities are identified in linear time based on edge streaming in graph. Then, through link analysis, more authentic pages are identified for summarization. The proposed method is validated through experimentation using real and synthetic data. The results indicate that the proposed model is useful for building an optimized search engine.
K. D. Raj and Dr. Sajeev G. P., “A Community Based Web Summarization in Near Linear Time”, 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, Bangalore, India, India, pp. pp. 962-968., 2018.