The challenge of semantic integration of heterogeneous databases is one of the critical areas of interest due to scalability of data and the need to share the existing data as the technology advances. The schema level heterogeneity of the relations is the major issue for such integration. Though various approaches of schema analysis, transformation and integration have been explored, sometimes those become too general to solve the problem especially when the data is very high-dimensional and the schema information is unavailable or inadequate. In this paper, a method to integrate heterogeneous relational schema at instance-level is proposed, rather than the schema level. A global schema is designed consisting of the integration of most relevant attributes of different relational schema of a particular domain. In order to find the significant attributes, multiple linear regressions based on LI norm and Singular Value Decomposition(SVD) is applied on the data iteratively. This is a variant of L1-PCA, which is efficient, effective and meaningful method of linear subspace estimation. The most prominent instance - level similarity is found by finding the most significant attributes of each relational data source and then finding the similarity among those attributes using L1-norm. Thus an integrated schema is created that maps the relevant attributes of each local schema to a global schema. © 2014 IEEE.
cited By 0; Conference of 2014 International Conference on Data Science and Engineering, ICDSE 2014 ; Conference Date: 26 August 2014 Through 28 August 2014; Conference Code:112595
Sandhya Harikumar, Reethima, R., and Dr. Kaimal, M. R., “Semantic integration of heterogeneous relational schemas using multiple L1 linear regression and SVD”, in International Conference on Data Science and Engineering, ICDSE 2014, 2014, pp. 105-111.