Semantic Integration in Heterogeneous Databases Using Neural Networks
Semantic Integration in Heterogeneous Databases Using Neural Networks
DBMS
Classify
Specific
Train
Parsers
Attributes Networks
Extract
And To
Database
Generate Recognize
Informati
Training Patterns
on
data
Length 1 1 C1 Address
Key 2 2 C2 Name
3 Employee.id
Value 3 C2 Dept.employee
Constraint Payroll.ssn
4
5
Data
Type
M Telephones
N
Average
N nodes in the input layer on the left, each of which represents a discriminator. The
hidden layer consists of (N+M)/2 ‘I nodes in the middle. The output layer (on the right
side) is composed of M nodes (M categories). The tagged data generated by the
classifier (Figure 4) is used as training data.During training, the network changes the
weights of connections between nodes so that each node in the output layer generates
its target result (corresponding category number). The forward propagation (generating
output), error calculation (co,mputing the difference between the actual output and target
output),and backward propagation (changing weights based on the errors calculated)
continue until the errors in the output layer are less than the threshold. For the AS/400
field reference 6le training data shown in Figure 4, we train the network do the following:
when we present “1 0.133 0.0 0 0 0.5” (cluster center weights of category l), the network
outputs “1 0 0 0 0 0 0 0 0”, the target result, which indicates category 1.When we
present “1 0.750 0.0 0 0 0.5” (cluster center weights of category 2), the network outputs
“0 10 0 0 0 0 0 0”, which indicates category 2. After training, the network encodes data
by matching each input pattern to the closest output node and giving the similarity
between the input pattern (of another database) and each category (we use to train the
network).
As an example take the result of the classifier in Figure2 that
clustered “Employee.id#“, “Dept.employee”,and “PayrollSSN” into one category. The
weights of these cluster centers are then tagged to train the network in Figure 5. After
the back-propagation network is trained, we present it with a new pattern of N
characteristics, attribute “healthPlan.Insured#“.This network determines the similarity
between the input pattern and each of the M categories. In Figure 5, the network shows
that the input pattern “Insured#” is closest to the category 3 (id numbers)
(similarity=O.92), and then category M (telephone#) (similarity=O.72). It also shows the
input pattern is not similar to either the category 1 (Address), or category 2 (Name) since
the similarity is low (0.05 and 0.12).Figure 5: Back-Propagation Network Architecture
The back-propagation learning algorithm is a super- 4.3 Semantic Integration Procedure
vtied learning algorithm, in which target results are provided. It has been used for
various tasks such as pattern recognition, control, and classification. Here we use it as
the training algorithm to train a network to recognize input patterns and give degrees of
similarity. Figure 5 shows a three-layer neural network *for recognizing M categories of
patterns. The structure of the network is designed as follows: There are 6The computing time
will increase as more. layers are added.
[SW
[SLCN88]
[TC93]
[VH94]
[Wie93]
puter Conference, pages 283289, Ana-
heim, CA, May 1980. AFIPS.
S. Navathe and Peter Buneman. Integrat-
ing user views in database design. Com-
puters, 19(1):50-62, January 1986.
J. M. Smith, P. A. Bernstein, U.DayaI,
N.Goodman, T. Landers, T. Lin, and
E.Wang. Multibase - integrating hetero-
geneous distributed database systems. In
Proceeding of the National Computer Con-
ference, pages 487-499. AFIPS, 1981.
V. C. Storey and R. C. Goldstein. Creat-
ing user views in database design. tin+
actions on Database Systems, pages 305-
338, September 1988.
Amit Sheth and Sunit K. Gala. Attribute
relationships: An impediment in automat-
ing schema integration. In Proceedings
of the NSF Workshop on Heterogeneous
Database Systems, Evanston, IL, Decem-
ber 1989.
Amit Sheth and James Larson. Feder-
ated database systems for managing dis-
tributed heterogeneous, and autonomous
databases. Computer Surveys, 22(3):183-
236, September 1990.
Amit Sheth, James Larson, A. CorneIio,
and S. B. Navathe. A tool for integrat-
ing conceptual schemas and user views. In
Proceedings of the 4th International Con-
ference on Data Engineering, Los Angeles,
CA, February 1988. IEEE.
Pauray SM. Tsai and Arbee L.P. Chen.
Querying uncertain data in heteroge-
neous databases. In Third Intewza-
tionaJ Workshop on Research Issues on
Data Engineering: INTEROPERABIL-
ITY IN MULTIDATABASE SYSTEMS,
pages 161-168, Vienna, Austria, April 18-
20 1993. IEEE.
Vincent Ventrone and Sandra Heiler.
Some advice for dealing with semantic
heterogeneity in federated database sys-
tems. Submitted to International Journal
of Computer-Aided Engineering, 1994.
Gio Wiederhold. Intelligent integration of
information. SIGMOD Record, pages 434-
437, May 1993.
12