Classifier based duplicate record elimination for query results from web databases

Publication Type : Conference Paper

Publisher : IEEE

Source : Trendz in Information Sciences Computing(TISC2010), IEEE, Chennai, India (2010)

Url : https://ieeexplore.ieee.org/abstract/document/5714607

Campus : Chennai

School : Department of Computer Science and Engineering, School of Engineering

Department : Computer Science

Year : 2010

Abstract : Record matching is an essential step in duplicate detection as it identifies records representing same real-world entity. Supervised record matching methods require users to provide training data and therefore cannot be applied for web databases where query results are generated on-the-fly. To overcome the problem, a new record matching method named Unsupervised Duplicate Elimination (UDE) is proposed for identifying and eliminating duplicates among records in dynamic query results. The idea of this paper is to adjust the weights of record fields in calculating similarities among records. Three classifiers namely weight component similarity summing classifier, support vector machine classifier and one class support vector machine classifier are iteratively employed with UDE where the first classifier utilizes the weights set to match records from different data sources. With the matched records as positive dataset and non duplicate records as negative set, the second classifier identifies new duplicates. Then, one-class support vector machine classifier is employed for further detecting the duplicates. The iteration stops when no duplicates can be identified. Thus, this paper takes advantage of dissimilarity among records from web databases and solves the online duplicate detection problem.

Cite this Research Publication : G. Kalpana, R. Prasanna Kumar, and Ravi, T., “Classifier based duplicate record elimination for query results from web databases”, in Trendz in Information Sciences Computing(TISC2010), Chennai, India, 2010.

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Haridwar

Research

Centers

Patents

Publication