Data Integration Research Paper

1173 Words5 Pages
What is Data Integration?
Data integration implies combining of data from multiple sources into a coherent data store, as in data warehousing. These sources may include multiple databases, data cubes, or flat files.
Data integration refers to the process and technologies for data movement from source Data Systems to Target Data Systems. On its way, data are usually transformed in order to fit business requirements.

Data producers Data Consumer

When we talk about data flows there are two sides of equation on one
…show more content…
Use of probabilistic key equivalence. Use of probabilistic attribute equivalence. Use of heuristic rules.
Data value conflict Detection and Resolution:-This is another problem of data integration Data value conflict appear when the data value in the two data source modeling the same property of a real world entity, do not match.

If we take tow data source Data_Source1, and Data_Source2 and two attributes S1 and S2 that refer to the same property of a real world entity in Data_Source1 and Data_Source2 respectively, if k1 is an element of Data_Source1 (k1∈Data_Source1) and k2 element of Data_Source2 (k2∈Data_Source2) correlate with the same real world instance of the object but k1.S1 not equal to k2.S2 (k1.S1≠k2.S2), then we say that a data value conflict exists between Data_Source1 and Data_Source2.

For example:- Distance attribute may be stored in kilometer in one system and mile other system. Weight attribute save in British imperial unit in one system and metric units in other. For a hotel chain price of room and food may be different currency in different cities not only currency but also services (e.g.:- free dinner) and tax may be
…show more content…
Redundancy is another huge issue in data integration. An attribute also be redundant if it may be “derived” from other attributes. Incompative in attributes can be also reason for the redundancies in resulting data set.
We can use correlation analysis to find some redundancies during the data integration. If we take two attributes correlation analysis measure one attribute how strongly implies to the other attributes according to the available data. We can use the X2 (chi-square) test for nominal data and correlation coefficient and covariance, for numeric data to find and reduce the redundancy during the data integration.
For nominal data correlation relationship between two attributes can be discovered by chi-square test equation of chi-square is x^2=∑▒(Observed-Expected)^2/Expected it mean larger the chi-square (x^2) value more likely the variable are related.
Example for chi-square calculation:- Male Female Sum(row) like_Math 240(80) 210(370)

More about Data Integration Research Paper

Open Document