Integration_Techniques_to_Build_a_Data_Warehouse_u
Integration_Techniques_to_Build_a_Data_Warehouse_u
net/publication/242092901
CITATIONS READS
4 507
2 authors, including:
Faouzi Boufarès
Université Paris 13 Nord
41 PUBLICATIONS 159 CITATIONS
SEE PROFILE
All content following this page was uploaded by Faouzi Boufarès on 13 May 2014.
Abstract: This work describes the construction of a data warehouse by the integration of
heterogeneous relational, object-relational and XML data (complex data). In fact, developing
intelligent tools for the integration of information extracted from multiple heterogeneous sources is a
challenging issue to effectively exploit the numerous sources available in global information systems.
Due to the heterogeneity of the sources, various languages of interrogation and different data models
are used for the warehouses. Thus, the construction of the latter can be reached by several manners.
Our work is based on the extraction of the inter-schema relationships between the sources. Related to
this, a global schema was generated and the views of the data warehouse were constructed. All these
stages, proposed in this work were implemented by the use of a functional prototype.
Key words: Database, data warehouse, heterogeneous structures and data, integration
Corresponding Author: F. Boufares, LIPN-UMR 7030 of C.N.R.S - Paris 13 University, Galilee Institute 99, avenue J. B.
Clément, F-93430 Villetaneuse France. Tel: +33 1 49 40 40 71, Fax: +33 1 48 26 07 12
48
J. Computer Sci., (Special Issue): 48-55, 2005
DB2 is made of m’ relations DB2={R’p} 1 p m’, DB1 represents the information system of a
each relation R’p is made of n’p attributes R’p={A’q}, company of production and DB2 represents the
1 q n’p. information system of a service company.
Each attribute A is defined by a domain Dom(A)
which is the set of valid instances of A. Dom(A) can be Example 2: A set of synonyms
either a predefined domain (varchar, number, date) or a Let’s take into account the relationships given below to
user defined domain (type). integrate DB1 and DB2.
We note DB.R.A the attribute of the relation R DB1.client.id_client SYN DB2.company.id_co
belonging to the database DB. DB1.client.address SYN DB2.company.address
DB1.sale.amount SYN DB2.operation.amount
Definition 1: The Synonymy DB1.product.id_product SYN DB2.service.id_service
An attribute A is a synonym of an attribute A’ if they DB1.product.label SYN DB2.service.label
have the same domains. We note: DB1.Ri.Aj SYN DB1.sale.id_sale SYN DB2.operation.id_operation
DB2.R’p.A’q if Dom(Aj)=Dom(A’q). We can validate DB1.sale.id_client SYN DB2.operation.id_co
this relationship by verifying that the set of constraints DB1.sale.id_product SYN DB2.operation.id_service
defined on Aj and A’q are either identical or, at least, DB1.sale.date_of_sale SYN DB2.operation.start_date
do not present contradiction.
Case of XML sources: To illustrate these relationships
Definition 2: The Inclusion we consider two XML files: F1 and F2. We adopt the
DB1.Ri.Aj INC DB2.R’p.A’q if Dom(Aj) is a subset of DTD to describe XML files structures. A widely
Dom(A’q). We can validate this relationship by description of XML files (with the XML schemas) is
verifying that the constraints defined on A’p are a subset not treated in this paper.
or the same of the constraints defined on Aj. F1 is described by the DTD: D1 and F2 is described
by the DTD: D2. D1 and D2 are composed by a set of
Definition 3: The Disjunction elements and attributes.
DB1.Ri.Aj DISJ DB2.R’p.A’q if DB1.Ri.Aj Not SYN An example of a DTD is the following:
DB2.R’p.A’q and DB1.Ri.Aj Not INC DB2.R’p.A’q <?XML version=”2.0”?>
and DB2.R’p.A’q Not INC DB1.Ri.Aj. <!ELEMENT el1 (el11,el12,el13,el14)>
Note that in the case of object-relational database, <!ELEMENT el11 (#PCDATA)>
attributes can be composed ones. Relationships between <!ELEMENT el12 (#PCDATA)>
composed attributes can be validated using the two <!ELEMENT el13 (#PCDATA)>
following rules: <!ELEMENT el14 (#PCDATA)>)>
Let’s consider two composed attributes: This example describes XML files containing
DB1.Ri.Aj={c1,…,cr} and DB2.R’p.A’q={c’1,…,c’r’}. composed elements el1 having each one 4 sub-
elements: el11, el12, el13 and el14.
Rule 1: DB1.Ri.Aj SYN DB2.R’p.A’q if r=r’ and ∀ s / D1 is made of m elements D1={Ei} 1 i m, each
1 s r, cs SYN c’s. elemnt Ei is eventually composed of ni elements Ei={Eij
Rule 2: DB1.Ri.Aj INC DB2.R’p.A’q if r r’ and ∀ s / } 1 j ni and so on for all the elements.
1 s r’, cs SYN c’s. D2 is made of m’ elements D2={E’p} 1 p m’,
each element E’p is eventually composed of n’p
Example 1: Two databases elements E’p={Epq} q n’p.
DB1 DB2 We note D.E. the element E of the DTD: D.
Definition 4: The Synonymy (SYN)
Two composed elements D1.E1 and D2.E2 are synonyms
if they have the same number of sub-elements and there
sub-elements are synonyms.
Definition 5: The Inclusion (INC)
A composed element D1.E1 is include in a composed
element D2.E2 if
* the number of sub-elements of D2.E2 is greater than
the number of sub-elemnts of D1.E1.
* each sub-element of D1.E1 is synonyms to a sub-
element of D2.E2
Definition 6: The Disjunction (DISJ)
D1.E1 DISJ D2.E2 if D1.E1 Not SYN D2.E2 and
D1.E1 Not INC D2.E2 and D2.E2 Not INC D1.E1.
The global schema generation is done as follows: Steps 5 and 6- the views are then calculated:
T= the number of the views.
1- The first step is:
L’= { DB1.client.(id_client, first_name, last_name, t1=2 and t2=1. T=t1*t2=2.
address, type); DB1.product.(id_product, label); Two views V1 and V2 are then obtained:
DB1.sale.(id_sale, id_client, id_product, V1 = { DB1.client.(id_client, first_name , last_name ,
quantity, amount, date_of_sale); address, type) ; DB1.product.(id_product, label) ;
DB1.employee.(id_emp, first_name, last_name, DB1.sale.(id_sale, id_client, id_product,
address); DB2.Company.(id_co, name, address); DB1.sale.(quantity, amount, date_of_sale) ;
DB2.service.(id_service, label); DB2.operation.(id_op, DB2.operation.duration}
id_co, id_service, duration, amount, start_date)} V2 = { DB1.employee.(id_emp , first_name, last_name,
L’ with out synonyms is then: address) ; DB2.operation.duration}
L’= {DB1.client.(id_client, first_name, last_name,
address, type); DB1.product.(id_product, label); Case of XML sources: In our work, we propose to
DB1.sale.(id_sale, id_client, id_product, generate the global schema while considering the set L
quantity, amount, date_of_sale); of sources elements which must appear in the DW. L
DB1.employee.(id_emp,first_name, last_name, ={Dk.Ei.} }/ k ∈ [1..d] and i ∈ [1..rk]. L is proposed
address); DB2.operation.duration} by the DW administrator according to decision needs.
We note d= the number of Ds; rk=number of
2- The second step is: elements of Dk;
L’1 = { DB1.client.(id_client, first_name, last_name,
address, type); DB1.product.(id_product, label); The global schema generation is done by the
DB1.sale.(id_sale, id_client, id_product, following way (2 steps): The first step of the global
quantity, amount, date_of_sale); schema generation is to filter the set L to form the set
DB1.employee.(id_emp,first_name, last_name, L’ by eliminating from L one element from each couple
address)} of synonymous ones. L’ does not contain synonyms and
L’2= { DB2.operation.duration} L’ is included in L.
L’=∪k ∈ [1..d](L’k) such as each L’k contains only the
3- Several graphs are constructed. Dk elements. L’k={Dk.Ei} i ∈ [1..rk].
A view Vu is associated to L’. Vu contains all the
The graph G1: elements belonging to L’. One view Vu is then
client obtained. The DTD, of the view Vu obtained, forms the
employee global schema of the warehouse.
Union statement of all the queries in Q={Q(S(Aj)) ∪ S(D2.Ej) the set of synonyms of D2. Ej which belong
Q(I(Aj)) ∪ Q(S(A’q)) ∪ Q(I(A’q))} to D1:
S(D2. Ej) = {D1.Sz(Ej)}/ j ∈ [1..r1], z ∈ [1.. NS(Ej)]
Example 7: Views bulding
Let’s now build the view V=( DB1.client.id_client, 3- Inclusions
DB2.operation.amount) using Example 1. We note: NI(E) = number of elements which are
include (INC) in E.
1- Join conditions I(D1. Ei) ) the set of elements which are include in
C1 = {sale.id_client = client.id_client, D1. Ei and belong to D2:
sale.id_product =product.id_product} I(D1. Ei)={D2.Iy(Ei)}/ i ∈ [1..r2], y∈
C2 = {operation.id_co = company.id_co, [1..NI(Ei)]
operation.id_service = service.id_service} I(D2.Ej) the set of elements which are included in
D2.Ej and belong to D1:
2- Synonyms I(D2.Ej) = {D1.Iw(Ej)}/ j ∈ [1..r1], w ∈
S(DB1.client.id_client)={DB2.company.id_co} [1..NI(Ej)]
S(DB2.operation.amount)={DB1.sale.amount}
Generation of queries:
3- Inclusions F1 interrogations:
I(DB1.client.id_client)={} FOR ecah D1.Sz(Ej) IN S(D2. Ej) generate the query
I(DB2.operation.amount)={} Q(Sz(Ej)) END;
Q(Sz(Ej)) can be defined by the XML-QL language:
4- Generation of queries
DB1 interrogations: Where
Q(S(DB2.operation.amount))={Q(DB1.sale.amount)} < Ei > $a </ Ei >
With SQL definitions, Q(S(DB2.operation.amount)) < Sz(Ej)> $b </ Sz(Ej)>
={Select client.id_client , sale.amount from client, sale in “F1.xml”
where sale.id_client = client.id_client;} Construct
Q(I(DB2.operation.amount))={} <Result>
< Ei >$a </ Ei >
DB2 interrogations: < Ej >$b </ Ej >
Q(S(DB1.client.id_client))={Q(DB2.company.id_co)} </Result>
With SQL definitions, Q(S(DB1.client.id_client)) = We note Q(S(Ej)) the set of all the queries Q(Sz(Ej))
{Select company.id_co, sale.amount from company, generated, z ∈ [1..NS(Ej)]
operation where operation.id_co = company.id_co;}
Q(I(DB1.client.id_client)) = {} FOR each D1.Iw(Ej) IN I(D2. Ej) generate the query
The construction of the view V is realized by the Q(Iw(Ej)) END;
execution of the SQL Union statement of the two Q(Iw(Ej)) can be defined by the XML-QL language:
queries generated.
Case of XML sources: For each view V from the GS, Where
< Ei > $a </ Ei >
V={Dk.Ei}/ k ∈ [1..d] and i ∈ [1..rk].
< Iy(Ej)> $b </ Iy(Ej)>
The construction of the DW consists in the
in “F1.xml”
construction of all the views of the global schema.
Construct
To illustrate our approach, let’s build a view V
<Result>
where V={ D1.Ei, D2.Ej}. We proceed in the same way
< Ei >$a </ Ei >
in the case of several attributes.
Four steps are needed for the construction of the < Ej >$b </ Ej >
DW. First the join conditions in the databases are </Result>
established. Second, the sets of synonyms are found. We note Q(I(Ej)) the set of all the queries Q(Iw(Ej))
Then the sets of inclusions are made up. At last queries generated, w ∈ [1..NI(Ej)]
are generated.
F2 interrogations:
The construction is done by the following way:
FOR each D2.Sx(Ei) IN S(D1. Ei) generate the query
1- Synonyms Q(Sx(Ei)) END,
We note: NS(E) = the number of synonyms (SYN) of Q(Sx(Ei)) can be defined by the XML-QL language:
E.
S(D1.Ei) the set of synonyms of D1.Ei. which belong Where
to D2: < Sz(Ei)> $a </ Sz(Ei)>
S(D1.Ei) = {D2.Sx(Ei)}/ i ∈ [1..r2], x ∈ [1..NS(Ei)] < Ej > $b </ Ej>
53
J. Computer Sci., (Special Issue): 48-55, 2005
in “F2.xml” S(D1.sale.client.address)={D2.operation.company.addre
Construct ss_co}
<Result> S(D2.operation.amount)={D1.sale.amount}
< Ei >$a </ Ei >
< Ej >$b </ Ej > 3- Inclusions
</Result> I(D1.sale.client.address)={}
We note Q(S(Ei)) the set of all the queries Q(Sx(Ei)) I(D2.operation.amount)={}
generated , x ∈ [1..NS(Ei)] 4- Generation of queries
FOR each D2.Iy(Ei) IN I(D1. Ei) generate the query F1 interrogations:
Q(Iy(Ei)) END; Q(S(D2.operation.amount))={Q(D1.sale.amount)}
Q(Iy(Ei)) can be defined by the XML-QL language: With XML-QL definitions, Q(S(D2.operation.amount))
=
Where Where
< Iy(Ei)> $a </ Iy(Ei)> <sale><client><address> $a
< Ej > $b </ Ej > </address></client></sale>
in “F2.xml” <sale><amount> $b </ amount></sale>
Construct in “F1.xml”
<Result> Construct
< Ei >$a </ Ei > <Result>
< Ej >$b </ Ej > <sale><client><address>$a
</Result> </sale></client></address>
We note Q(I(Ei)) the set of all the queries Q(Iy(Ei)) <operation><amount>$b </operation></amount>
generated, y ∈ [1..NI(Ei)] </Result>
The construction of the view V={D1. Ei, D2. Ej } is Q(I(D2.operation.amount))={}
realized by the execution of the XML-QL Union
statement of all the queries in Q={Q(S(Ei)) ∪ Q(I(Ei)) F2 interrogations:
∪ Q(S(Ej)) ∪ Q(I(Ej))} Q(S(D1.sale.client.address))={Q(D2.operation.company
.address_co)}
Example 8: Views building With XML-QL definitions, Q(S(D1.sale.client.address))
Let’s now build the view V=( D1.sale.client.address, =
D2.operation.amount) using Example 1. Where
<operation><company><address> $a
2- Synonyms </address></company></operation>
<operation><amount> $b </ amount></operation>
54
J. Computer Sci., (Special Issue): 48-55, 2005