0% found this document useful (0 votes)
2 views

Integration_Techniques_to_Build_a_Data_Warehouse_u

Uploaded by

Sohail Qureshi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Integration_Techniques_to_Build_a_Data_Warehouse_u

Uploaded by

Sohail Qureshi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/242092901

Integration Techniques to Build a Data Warehouse using Heterogeneous Data


Sources

Article · January 2005

CITATIONS READS

4 507

2 authors, including:

Faouzi Boufarès
Université Paris 13 Nord
41 PUBLICATIONS 159 CITATIONS

SEE PROFILE

All content following this page was uploaded by Faouzi Boufarès on 13 May 2014.

The user has requested enhancement of the downloaded file.


Journal of Computer Science (Special Issue): 48-55, 2005
ISSN 1549-3636
© 2005 Science Publications

Integration Techniques to Build a Data Warehouse using Heterogeneous Data Sources

F. Boufares and S. Hamdoun


LIPN-UMR 7030 of C.N.R.S- Paris 13 University, Galilee Institute
99, Avenue J. B. Clément, F-93430 Villetaneuse France

Abstract: This work describes the construction of a data warehouse by the integration of
heterogeneous relational, object-relational and XML data (complex data). In fact, developing
intelligent tools for the integration of information extracted from multiple heterogeneous sources is a
challenging issue to effectively exploit the numerous sources available in global information systems.
Due to the heterogeneity of the sources, various languages of interrogation and different data models
are used for the warehouses. Thus, the construction of the latter can be reached by several manners.
Our work is based on the extraction of the inter-schema relationships between the sources. Related to
this, a global schema was generated and the views of the data warehouse were constructed. All these
stages, proposed in this work were implemented by the use of a functional prototype.

Key words: Database, data warehouse, heterogeneous structures and data, integration

INTRODUCTION extract the inter-schema relationships, but we also


compare, whenever it is possible, domains and
The goal of automatic information structuring and constraints.
data integration techniques is to construct synthesized
information coming from multiple sources. In recent
years, there have been many research projects focusing
on heterogeneous information integration[1]. Some
contributions appeared in the recent literature, including
methods, techniques and tools for integrating and
querying heterogeneous databases[2-7]. The
heterogeneity concerns specially the semantic aspect of
data. In fact, few works were developed around data
structures heterogeneity. In our work, we will consider
the two aspects of heterogeneity, the semantic and the
Fig. 1: Heterogeneous sources of the data warehouse
structural ones. Then we will consider that Data sources
can be structured (relational, object), semi-structured
Data sources relationships extraction: The first step
(XML) and non structured (multimedia data) (Fig. 1).
of our work is the extraction of relationships between
The integration process has to take into account all
different components of data sources (schemas,
these data categories. Our aim is then, to build a data
DTD,..). Currently, these relationships are proposed by
warehouse using two phases process, in one hand
the data warehouse administrator, our future goal is to
conceptual representations must be assembled and in
automate or partially automate this step.
the other hand, heterogeneous data must be merged. We
We define three types of relationships between
propose a framework of integration of heterogeneous
elementary information (attributes, properties, XML
databases (relational and/or object-relational and
elements…): the synonymy (SYN), the inclusion (INC)
XML). This work is based on the extraction of different and the disjunction (DISJ).
relationships between sources. By using these
relationships, we generate the global schema of the Case of relational and object-relational data
integration result (the data warehouse). The sources: To illustrate these relationships we consider
construction step is then realized by the generation of two relational and/or object-relational databases: DB1
different queries to the data sources. and DB2.
Our contribution treats not only the heterogeneity DB1 is made of m relations DB1={Ri} 1 i m,
of data but also that of structures. Moreover, we can each relation Ri is composed of ni attributes Ri={ Aj },
note that we do not simply compare elements names, to 1 j ni.

Corresponding Author: F. Boufares, LIPN-UMR 7030 of C.N.R.S - Paris 13 University, Galilee Institute 99, avenue J. B.
Clément, F-93430 Villetaneuse France. Tel: +33 1 49 40 40 71, Fax: +33 1 48 26 07 12
48
J. Computer Sci., (Special Issue): 48-55, 2005

DB2 is made of m’ relations DB2={R’p} 1 p m’, DB1 represents the information system of a
each relation R’p is made of n’p attributes R’p={A’q}, company of production and DB2 represents the
1 q n’p. information system of a service company.
Each attribute A is defined by a domain Dom(A)
which is the set of valid instances of A. Dom(A) can be Example 2: A set of synonyms
either a predefined domain (varchar, number, date) or a Let’s take into account the relationships given below to
user defined domain (type). integrate DB1 and DB2.
We note DB.R.A the attribute of the relation R DB1.client.id_client SYN DB2.company.id_co
belonging to the database DB. DB1.client.address SYN DB2.company.address
DB1.sale.amount SYN DB2.operation.amount
Definition 1: The Synonymy DB1.product.id_product SYN DB2.service.id_service
An attribute A is a synonym of an attribute A’ if they DB1.product.label SYN DB2.service.label
have the same domains. We note: DB1.Ri.Aj SYN DB1.sale.id_sale SYN DB2.operation.id_operation
DB2.R’p.A’q if Dom(Aj)=Dom(A’q). We can validate DB1.sale.id_client SYN DB2.operation.id_co
this relationship by verifying that the set of constraints DB1.sale.id_product SYN DB2.operation.id_service
defined on Aj and A’q are either identical or, at least, DB1.sale.date_of_sale SYN DB2.operation.start_date
do not present contradiction.
Case of XML sources: To illustrate these relationships
Definition 2: The Inclusion we consider two XML files: F1 and F2. We adopt the
DB1.Ri.Aj INC DB2.R’p.A’q if Dom(Aj) is a subset of DTD to describe XML files structures. A widely
Dom(A’q). We can validate this relationship by description of XML files (with the XML schemas) is
verifying that the constraints defined on A’p are a subset not treated in this paper.
or the same of the constraints defined on Aj. F1 is described by the DTD: D1 and F2 is described
by the DTD: D2. D1 and D2 are composed by a set of
Definition 3: The Disjunction elements and attributes.
DB1.Ri.Aj DISJ DB2.R’p.A’q if DB1.Ri.Aj Not SYN An example of a DTD is the following:
DB2.R’p.A’q and DB1.Ri.Aj Not INC DB2.R’p.A’q <?XML version=”2.0”?>
and DB2.R’p.A’q Not INC DB1.Ri.Aj. <!ELEMENT el1 (el11,el12,el13,el14)>
Note that in the case of object-relational database, <!ELEMENT el11 (#PCDATA)>
attributes can be composed ones. Relationships between <!ELEMENT el12 (#PCDATA)>
composed attributes can be validated using the two <!ELEMENT el13 (#PCDATA)>
following rules: <!ELEMENT el14 (#PCDATA)>)>
Let’s consider two composed attributes: This example describes XML files containing
DB1.Ri.Aj={c1,…,cr} and DB2.R’p.A’q={c’1,…,c’r’}. composed elements el1 having each one 4 sub-
elements: el11, el12, el13 and el14.
Rule 1: DB1.Ri.Aj SYN DB2.R’p.A’q if r=r’ and ∀ s / D1 is made of m elements D1={Ei} 1 i m, each
1 s r, cs SYN c’s. elemnt Ei is eventually composed of ni elements Ei={Eij
Rule 2: DB1.Ri.Aj INC DB2.R’p.A’q if r r’ and ∀ s / } 1 j ni and so on for all the elements.
1 s r’, cs SYN c’s. D2 is made of m’ elements D2={E’p} 1 p m’,
each element E’p is eventually composed of n’p
Example 1: Two databases elements E’p={Epq} q n’p.
DB1 DB2 We note D.E. the element E of the DTD: D.
Definition 4: The Synonymy (SYN)
Two composed elements D1.E1 and D2.E2 are synonyms
if they have the same number of sub-elements and there
sub-elements are synonyms.
Definition 5: The Inclusion (INC)
A composed element D1.E1 is include in a composed
element D2.E2 if
* the number of sub-elements of D2.E2 is greater than
the number of sub-elemnts of D1.E1.
* each sub-element of D1.E1 is synonyms to a sub-
element of D2.E2
Definition 6: The Disjunction (DISJ)
D1.E1 DISJ D2.E2 if D1.E1 Not SYN D2.E2 and
D1.E1 Not INC D2.E2 and D2.E2 Not INC D1.E1.

Example 3: Two XML files F1 and F2 defined by D1


and D2
49
J. Computer Sci., (Special Issue): 48-55, 2005

D1 Case of relational and object-relational data


<?XML version=”2.0”?> sources: In our work, we propose to generate the global
<!ELEMENT sale schema GS while considering the set L of sources
(client,product,quantity,amount,date_of_sale)> attributes which must appear in the DW. L
<!ELEMENT client ={DBk.Ri.Aj} }/ k ∈ [1..d]; i ∈ [1..rk] and j ∈ [1..ai].
(first_name,last_name,address,type)> L is proposed by the DW administrator according to
<!ELEMENT product (label)> decision needs.
<!ELEMENT first_name (#PCDATA)> We note d= the number of DBs; rk=number of
<!ELEMENT last_name (#PCDATA)> relations of DBk; ai=number of attributes of Ri
<!ELEMENT address (#PCDATA)>
<!ELEMENT type (#PCDATA)>)> The global schema generation is done by the
<!ELEMENT label (#PCDATA)> following way (six steps): The first step of the global
<!ELEMENT quantity (#PCDATA)> schema generation is to filter the set L to form the set
<!ELEMENT amount (#PCDATA)> L’ by eliminating from L one attribute from each
<!ELEMENT date_of_sale (#PCDATA)> couple of synonymous ones. L’ does not contain
synonyms and L’ is included in L.
D2 L’=∪k ∈ [1..d](L’k) such as each L’k contains only the
<?XML version=”2.0”?> DBk attributes. L’k={DBk.Ri.Aj} i ∈ [1..rk] and j ∈
<!ELEMENT operation
[1..ai].
(service,company,amount,duration,start_date)>
<!ELEMENT employee For each L’k a non oriented graph Gk is built. Each
(first_name_emp,last_name_emp,address_emp)> node corresponds to a relation Ri from DBk and each
<!ELEMENT company (name, address_co)> arc denotes reference constraints between two relations.
<!ELEMENT first_name_emp (#PCDATA)> d graphs are obtained.
<!ELEMENT last_name_emp (#PCDATA)> Each graph Gk may be composed of subgraphs
<!ELEMENT address_emp (#PCDATA)> strongly connex Gk={G’ko} / o ∈ [1..tk]; with tk=number
<!ELEMENT address_co (#PCDATA)>)> of subgraphs strongly connex of Gk. A graph is strongly
<!ELEMENT name (#PCDATA)>
connex if there is at least one path between each two
<!ELEMENT amount (#PCDATA)>
<!ELEMENT duration (#PCDATA)> nodes of the graph[8].
<!ELEMENT start_date (#PCDATA)> Considering the d graphs {Gk}/ k ∈ [1..d], divided
D1 represents the information system of a company on subgraphs, the different possible combinations
of production and D2 represents the information system between the subgraphs are realized by taking one
of a service company. subgraph from each graph Gk. The set C of
combinations is obtained: C={{G’ko}} / k ∈ [1..d] and o
Example 4: A set of synonyms ∈ [1..tk].
Let’s take into account the relationships given below to
C has T elements, T=Πk∈[1..d] (tk), C={Cu}/ u ∈ [1..T].
integrate F1 and F2.
D1.sale.client.address SYN A view Vu is associated to each combination Cu
D2.operation.company.address_co from C. Vu contains all the attributes belonging to L’
D1.sale.amount SYN D2.operation.amount and having their respective relations represented by
D1.sale.product.label SYN D2.operation.service.label nodes in the subgraphs forming Cu. T views are then
D1.sale.date_of_sale SYN D2.operation.start_date obtained. The schemas of the T views obtained form the
D1.sale.client.last_name SYN global schema of the warehouse.
D2.operation.company.name
Once the different relationships between data
Example 5: A list of attributes, the set L
sources given, we can move to the presentation of our
methodology of the global schema generation (data Let us now generate the global schema based on the
warehouse structuring). given set of attributes L using Example 1:
L = {DB1.client.(id_client, first_name, last_name,
GLOBAL SCHEMA GENERATION address, type); DB1.product.(id_product, label);
DB1.sale.(id_sale, id_client, id_product,
A data warehouse (DW) is composed of a set of quantity, amount, date_of_sale);
views. DW={Vi}/ i ∈ [1..n]. Each view is constructed DB1.employee.(id_emp, first_name, last_name,
by interrogation and integration of data from different address); DB2.Company.(id_co, name, address);
sources. DB2.service.(id_service, label); DB2.operation.(id_op,
id_co, id_service, duration, amount, start_date)}.
50
J. Computer Sci., (Special Issue): 48-55, 2005

The global schema generation is done as follows: Steps 5 and 6- the views are then calculated:
T= the number of the views.
1- The first step is:
L’= { DB1.client.(id_client, first_name, last_name, t1=2 and t2=1. T=t1*t2=2.
address, type); DB1.product.(id_product, label); Two views V1 and V2 are then obtained:
DB1.sale.(id_sale, id_client, id_product, V1 = { DB1.client.(id_client, first_name , last_name ,
quantity, amount, date_of_sale); address, type) ; DB1.product.(id_product, label) ;
DB1.employee.(id_emp, first_name, last_name, DB1.sale.(id_sale, id_client, id_product,
address); DB2.Company.(id_co, name, address); DB1.sale.(quantity, amount, date_of_sale) ;
DB2.service.(id_service, label); DB2.operation.(id_op, DB2.operation.duration}
id_co, id_service, duration, amount, start_date)} V2 = { DB1.employee.(id_emp , first_name, last_name,
L’ with out synonyms is then: address) ; DB2.operation.duration}
L’= {DB1.client.(id_client, first_name, last_name,
address, type); DB1.product.(id_product, label); Case of XML sources: In our work, we propose to
DB1.sale.(id_sale, id_client, id_product, generate the global schema while considering the set L
quantity, amount, date_of_sale); of sources elements which must appear in the DW. L
DB1.employee.(id_emp,first_name, last_name, ={Dk.Ei.} }/ k ∈ [1..d] and i ∈ [1..rk]. L is proposed
address); DB2.operation.duration} by the DW administrator according to decision needs.
We note d= the number of Ds; rk=number of
2- The second step is: elements of Dk;
L’1 = { DB1.client.(id_client, first_name, last_name,
address, type); DB1.product.(id_product, label); The global schema generation is done by the
DB1.sale.(id_sale, id_client, id_product, following way (2 steps): The first step of the global
quantity, amount, date_of_sale); schema generation is to filter the set L to form the set
DB1.employee.(id_emp,first_name, last_name, L’ by eliminating from L one element from each couple
address)} of synonymous ones. L’ does not contain synonyms and
L’2= { DB2.operation.duration} L’ is included in L.
L’=∪k ∈ [1..d](L’k) such as each L’k contains only the
3- Several graphs are constructed. Dk elements. L’k={Dk.Ei} i ∈ [1..rk].
A view Vu is associated to L’. Vu contains all the
The graph G1: elements belonging to L’. One view Vu is then
client obtained. The DTD, of the view Vu obtained, forms the
employee global schema of the warehouse.

Example 6: A list of elements, the set L


product Let us now generate the global schema based on the
given set of elements L using Example 1:
Sale L = {D1.sale.client.(first_name, last_name, address,
type); D1.sale.product.(label); D1.sale.(quantity,
The graph G2:
amount, date_of_sale); D2.employee.(first_name_emp,
last_name_emp, address_emp);
D2.operation.Company.(name, address_co);
operation D2.operation.service.(label); D2.operation.(duration,
amount, start_date)}.
4- Strongly connex graphs The global schema generation is done as follows:
G1={G’11, G’12} and G2={G’21}
1- The first step is:
G’11 client L’ = {D1.sale.client.(first_name, last_name, address,
type); D1.sale.product.(label); D1.sale.(quantity,
amount, date_of_sale); D2.employee.(first_name_emp,
product Sale last_name_emp, address_emp);
G’12 D2.operation.Company. (name, address_co);
D2.operation.service.(label_1); D2.operation.(duration,
amount, start_date}.
employee L’ with out synonyms is then:
L’ = {D1.sale.client.(first_name, last_name, address,
G’21 type); D1.sale.product.(label); D1.sale.(quantity,
operation amount, date_of_sale); D2.employee.(first_name_emp,
51
J. Computer Sci., (Special Issue): 48-55, 2005

last_name_emp, address_emp); S(DB1.Ri.Aj) the set of synonyms of DB1.Ri.Aj


D2.operation.(duration)}. which belong to DB2:
S(DB1.Ri.Aj) = {DB2.R’k.Sx(Aj)}/ k ∈ [1..r2],
2- A view Vu is associated to L’. Vu contains all the x ∈ [1..NS(Aj)]
elements belonging to L’. One view is then obtained. S(DB2.R’p.A’q) the set of synonyms of DB2.R’p.A’q
The DTD of the view Vu obtained form the global which belong to DB1:
schema of the warehouse. The DTD obtained is the S(DB2.R’p.A’q) = {DB1.Rk.Sz(A’q)}/ k ∈ [1..r1],
following:
z ∈ [1.. NS(A’q)]
<?XML version=”2.0”?>
<!ELEMENT sale
3- Inclusions
(client,product,quantity,amount,date_of_sale,
We note: NI(A) = number of attributes which are
duration)>
include (INC) in A.
<!ELEMENT employee
I(DB1.Ri.Aj) ) the set of attributes which are include
(first_name_emp,last_name_emp,address_emp)>
in DB1.Ri.Aj and belong to DB2:
<!ELEMENT client
I(DB1.Ri.Aj)={DB2.R’k.Iy(Aj)}/ k ∈ [1..r2],
(first_name,last_name,address,type)>
<!ELEMENT product (label)> y ∈ [1..NI(Aj)]
<!ELEMENT first_name (#PCDATA)> I(DB2.R’p.A’q) the set of attributes which are
<!ELEMENT last_name (#PCDATA)> included in DB2. R’p.A’q and belong to DB1:
<!ELEMENT address (#PCDATA)> I(DB2.R’p.A’q) = {DB1.Rk.Iw(A’q)}/ k ∈ [1..r1],
<!ELEMENT type (#PCDATA)>)> w ∈ [1..NI(A’q)]
<!ELEMENT label (#PCDATA)>
<!ELEMENT quantity (#PCDATA)> 4- Generation of queries
<!ELEMENT amount (#PCDATA)> DB1 interrogations:
<!ELEMENT date_of_sale (#PCDATA)> FOR ecah DB1.Rk.Sz(A’q) IN S(DB2.R’p.A’q) generate
<!ELEMENT duration (#PCDATA)> the query Q(Sz(A’q)) END;
<!ELEMENT first_name_emp (#PCDATA)> Q(Sz(A’q)) can be defined by the SQL language: Select
<!ELEMENT last_name_emp (#PCDATA)> Aj, Sz(A’q) from Ri, Rk where Cond1;
<!ELEMENT address_emp (#PCDATA)> With Cond1 the join condition between Ri and Rk,
extracted from C1.
DATA WAREHOUSE/VIEWS CONSTRUCTION We note Q(S(A’q)) the set of all the queries Q(Sz(A’q))
generated, z ∈ [1..NS(A’q)]
The global schema generation allows the
determination of the number of views composing the FOR each DB1.Rk.Iw(A’q) IN I(DB2.R’p.A’q) generate
DW and their contents. In fact, L can describe one or the query Q(Iw(A’q)) END;
more views. Q(Iw(A’q)) can be defined by the SQL language:
Select Aj, Iw(A’q) from Ri, Rk where Cond1;
Case of relational and object-relational data We note Q(I(A’q)) the set of all the queries Q(Iw(A’q))
sources: For each view V from the GS, V={DBk.Ri.Aj}/ generated, w ∈ [1..NI(A’q)]
k ∈ [1..d]; i ∈ [1..rk] and j ∈ [1..ai].
The construction of the DW consists in the DB2 interrogations:
construction of all the views of the global schema. FOR each DB2.R’k.Sx(Aj) IN S(DB1.Ri. Aj) generate
To illustrate our approach, let’s build a view V where the query Q(Sx(Aj)) END,
V={ DB1.Ri.Aj, DB2.R’p.A’q}. We proceed in the same Q(Sx(Aj)) can be defined by the SQL language:
way in the case of several attributes. Select Sx(Aj), A’q from R’k, R’p where Cond2;
Four steps are needed for the construction of the With Cond2 the join condition between R’k and R’p,
DW. First the join conditions in the databases are extracted from C2.
established. Second, the sets of synonyms are found. We note Q(S(Aj)) the set of all the queries Q(Sx(Aj))
Then the sets of inclusions are made up. At last queries generated , x ∈ [1..NS(Aj)]
are generated.
FOR each DB2.R’k.Iy(Aj) IN I(DB1.Ri.Aj) generate the
The construction is done by the following way:
query Q(Iy(Aj)) END;
1- Join conditions Q(Iy(Aj)) can be defined by the SQL language: Select
C1 the set of all the join conditions in DB1. Iy(Aj), A’q from R’k, R’p where Cond2;
C2 the set of all the join conditions in DB2. We note Q(I(Aj)) the set of all the queries Q(Iy(Aj))
generated, y ∈ [1..NI(Aj)]
2- Synonyms The construction of the view V={DB1. Ri.Aj,
We note: NS(A) = the number of synonyms (SYN) of DB2.R’p.A’q} is realized by the execution of the SQL
A.
52
J. Computer Sci., (Special Issue): 48-55, 2005

Union statement of all the queries in Q={Q(S(Aj)) ∪ S(D2.Ej) the set of synonyms of D2. Ej which belong
Q(I(Aj)) ∪ Q(S(A’q)) ∪ Q(I(A’q))} to D1:
S(D2. Ej) = {D1.Sz(Ej)}/ j ∈ [1..r1], z ∈ [1.. NS(Ej)]
Example 7: Views bulding
Let’s now build the view V=( DB1.client.id_client, 3- Inclusions
DB2.operation.amount) using Example 1. We note: NI(E) = number of elements which are
include (INC) in E.
1- Join conditions I(D1. Ei) ) the set of elements which are include in
C1 = {sale.id_client = client.id_client, D1. Ei and belong to D2:
sale.id_product =product.id_product} I(D1. Ei)={D2.Iy(Ei)}/ i ∈ [1..r2], y∈
C2 = {operation.id_co = company.id_co, [1..NI(Ei)]
operation.id_service = service.id_service} I(D2.Ej) the set of elements which are included in
D2.Ej and belong to D1:
2- Synonyms I(D2.Ej) = {D1.Iw(Ej)}/ j ∈ [1..r1], w ∈
S(DB1.client.id_client)={DB2.company.id_co} [1..NI(Ej)]
S(DB2.operation.amount)={DB1.sale.amount}
Generation of queries:
3- Inclusions F1 interrogations:
I(DB1.client.id_client)={} FOR ecah D1.Sz(Ej) IN S(D2. Ej) generate the query
I(DB2.operation.amount)={} Q(Sz(Ej)) END;
Q(Sz(Ej)) can be defined by the XML-QL language:
4- Generation of queries
DB1 interrogations: Where
Q(S(DB2.operation.amount))={Q(DB1.sale.amount)} < Ei > $a </ Ei >
With SQL definitions, Q(S(DB2.operation.amount)) < Sz(Ej)> $b </ Sz(Ej)>
={Select client.id_client , sale.amount from client, sale in “F1.xml”
where sale.id_client = client.id_client;} Construct
Q(I(DB2.operation.amount))={} <Result>
< Ei >$a </ Ei >
DB2 interrogations: < Ej >$b </ Ej >
Q(S(DB1.client.id_client))={Q(DB2.company.id_co)} </Result>
With SQL definitions, Q(S(DB1.client.id_client)) = We note Q(S(Ej)) the set of all the queries Q(Sz(Ej))
{Select company.id_co, sale.amount from company, generated, z ∈ [1..NS(Ej)]
operation where operation.id_co = company.id_co;}
Q(I(DB1.client.id_client)) = {} FOR each D1.Iw(Ej) IN I(D2. Ej) generate the query
The construction of the view V is realized by the Q(Iw(Ej)) END;
execution of the SQL Union statement of the two Q(Iw(Ej)) can be defined by the XML-QL language:
queries generated.
Case of XML sources: For each view V from the GS, Where
< Ei > $a </ Ei >
V={Dk.Ei}/ k ∈ [1..d] and i ∈ [1..rk].
< Iy(Ej)> $b </ Iy(Ej)>
The construction of the DW consists in the
in “F1.xml”
construction of all the views of the global schema.
Construct
To illustrate our approach, let’s build a view V
<Result>
where V={ D1.Ei, D2.Ej}. We proceed in the same way
< Ei >$a </ Ei >
in the case of several attributes.
Four steps are needed for the construction of the < Ej >$b </ Ej >
DW. First the join conditions in the databases are </Result>
established. Second, the sets of synonyms are found. We note Q(I(Ej)) the set of all the queries Q(Iw(Ej))
Then the sets of inclusions are made up. At last queries generated, w ∈ [1..NI(Ej)]
are generated.
F2 interrogations:
The construction is done by the following way:
FOR each D2.Sx(Ei) IN S(D1. Ei) generate the query
1- Synonyms Q(Sx(Ei)) END,
We note: NS(E) = the number of synonyms (SYN) of Q(Sx(Ei)) can be defined by the XML-QL language:
E.
S(D1.Ei) the set of synonyms of D1.Ei. which belong Where
to D2: < Sz(Ei)> $a </ Sz(Ei)>
S(D1.Ei) = {D2.Sx(Ei)}/ i ∈ [1..r2], x ∈ [1..NS(Ei)] < Ej > $b </ Ej>
53
J. Computer Sci., (Special Issue): 48-55, 2005

Fig. 2: The prototype architecture

in “F2.xml” S(D1.sale.client.address)={D2.operation.company.addre
Construct ss_co}
<Result> S(D2.operation.amount)={D1.sale.amount}
< Ei >$a </ Ei >
< Ej >$b </ Ej > 3- Inclusions
</Result> I(D1.sale.client.address)={}
We note Q(S(Ei)) the set of all the queries Q(Sx(Ei)) I(D2.operation.amount)={}
generated , x ∈ [1..NS(Ei)] 4- Generation of queries
FOR each D2.Iy(Ei) IN I(D1. Ei) generate the query F1 interrogations:
Q(Iy(Ei)) END; Q(S(D2.operation.amount))={Q(D1.sale.amount)}
Q(Iy(Ei)) can be defined by the XML-QL language: With XML-QL definitions, Q(S(D2.operation.amount))
=
Where Where
< Iy(Ei)> $a </ Iy(Ei)> <sale><client><address> $a
< Ej > $b </ Ej > </address></client></sale>
in “F2.xml” <sale><amount> $b </ amount></sale>
Construct in “F1.xml”
<Result> Construct
< Ei >$a </ Ei > <Result>
< Ej >$b </ Ej > <sale><client><address>$a
</Result> </sale></client></address>
We note Q(I(Ei)) the set of all the queries Q(Iy(Ei)) <operation><amount>$b </operation></amount>
generated, y ∈ [1..NI(Ei)] </Result>
The construction of the view V={D1. Ei, D2. Ej } is Q(I(D2.operation.amount))={}
realized by the execution of the XML-QL Union
statement of all the queries in Q={Q(S(Ei)) ∪ Q(I(Ei)) F2 interrogations:
∪ Q(S(Ej)) ∪ Q(I(Ej))} Q(S(D1.sale.client.address))={Q(D2.operation.company
.address_co)}
Example 8: Views building With XML-QL definitions, Q(S(D1.sale.client.address))
Let’s now build the view V=( D1.sale.client.address, =
D2.operation.amount) using Example 1. Where
<operation><company><address> $a
2- Synonyms </address></company></operation>
<operation><amount> $b </ amount></operation>
54
J. Computer Sci., (Special Issue): 48-55, 2005

in “F2.xml” Proceedings of the 12th Intl. Conf. Adv. Inform.


Construct Sys. Engg. (CAiSE 00), Stockholm, Sweden, pp:
<Result> 205-236.
<sale><client><address>$a 3. Beneventano, D., S. Bergamaschi, S. Castano, V.
</sale></client></address> De Antonellis, A. Ferrara, F. Guerra, F. Mandreoli,
<operation><amount>$b </operation></amount> G.C. Ornetti and M. Vincini, 2002. Semantic
</Result> integration and query optimization of
Q(I(D1.sale.client.address)) = {} heterogeneous data sources. Advances in Object-
The construction of the view V is realized by the Oriented Information Systems, OOIS 2002
execution of the XML-QL Union statement of the two Workshops, Montpellier, France, LNCS 2426, pp:
154-165.
queries generated.
4. Ursino, D., 2002. Construction of a data
APPROACH IMPLEMENTATION warehouse. Extraction and Exploitation of
Intentional Knowledge from Heterogeneous
Our approach is implemented, for the case of Information Sources: Semi-Automatic Approaches
relational and object relational databases[9, 10] , using and Tools, LNCS 2282, pp: 161-169.
PL/SQL and the Oracle Data Dictionary. The 5. Naggar, P., L. Pontieri, M. Pupo, G. Terracina and
prototype architecture is presented in the following E. Virardi, 2002. A model and a toolkit for
schema (Fig. 2). supporting Incremental data warehouse
To realize this prototype, meta-tables are proposed construction. Algorithms in Bioinformatics, Sec.
to store interschema relationships and views schemas Intl. Workshop, WABI 2002, Rome, Italy, LNCS
generated. The Oracle Data Dictionary[11, 12] is used for 2453, pp: 123-132.
the determination of references between relations. The 6. Vassiliadis, P., A. Simitsis, P. Georgantas and M.
three steps of the prototype are implemented on Terrovitis, 2003. A framework for the design of
PL/SQL packages using the meta-tables created. For ETL scenarios. Adv. Inform. Sys. Engg., 15th Intl.
example, the meta-table Liens describes the Conf., CAiSE 2003, Klagenfurt, Austria, LNCS
relationships between attributes: 2681, pp: 520-535.
Liens =(ref_base1:varchar; rel1:varchar; 7. Nguyen, B., 2003. Construction and maintenance
att1:varchar; ref_base2:varchar; rel2:varchar; of a web warehouse. Ph.D. Thesis Paris 11
att2:varchar; lien:varchar) where «ref_base1/2» refers University, France.
to data sources1, «rel1/2» refers to relations, «att1/2» 8. Lando, S.K. and A.K. Zvonkin, 2003. Graphs on
refers to attributes and «lien» refers to the relationship surfaces and their applications. Eyrolles Edn.
between ref_base1.rel1.att1 and ref_base2.rel2.att2. 9. Hamdoun, S. and F. Boufares, 2004. Relationships
Some procedures of the source of our prototype are between attributes to integrate heterogeneous data
presented in the appendix. sources. In Proc. Intl. Arab Conf. Inform. Technol.,
(ACIT’04), Constantine, Algeria.
CONCLUSION 10. Hamdoun, S. and F. Boufares, 2005. Relationships
between attributes to integrate heterogeneous data
In this paper, we have described our approach of
sources: An SQL methodology. In Proc. Intl. Conf.
integration of heterogeneous data (relational DBs,
Software Engg. (IASTED’05) on Databases and
object-oriented DBs and XML documents). This
Applications (DBA2005), Innsbruck, Austria.
approach is implemented for the case of relational and
11. Oracle Database 10g.
object-relational data sources by the mean of a
http://otn.oracle.com/software/products/database/oracle1
prototype using PL/SQL and the Oracle Data
0g/index.html.
Dictionary. We have treated, in this work, structured
12. Soutou, Ch., 2004. Programmer Objet Avec
and semi-structured heterogeneous data using
Oracle. Vuibert Edn.
extraction of relationships between different kinds of
13. Lellahi, S.K. and A. Zamulin, 2001. Object-
elements (attributes, properties, XML elements,…).
oriented database as a dynamic system with
Currently, we try to automate this step using graph
implicit state. In Proc. Fifth East-European Conf.
theory[13]. We project in the future to extend our work
Adv. Database and Inform. Sys., (ADBIS’01),
by using on the same time relational, semi-structured
Vilnius, Lithuania.
(XML data)[14, 15] and non structured Data (Multimedia
14. Gardarin, G., A. Mensch, T.T. Dang-Ngoc and L.
data).
REFERENCES Smit, 2002. Integrating heterogeneous data sources
with XML and Xquery. In Proc. Database and
1. Gray, J., 2004. The revolution in database Expert Systems Applications, 13th Intl. Conf.,
architecture. ACM SIGMOD/PODS 2004 Conf. DEXA 2002, Aix-en-Provence, France.
Paris, France. 15. Hamdoun, S. and F. Boufares, 2003. An XML
2. Vassiliadis, P., C. Quix, Y. Vassiliou and M. Jarke, approach of construction and self-maintenance of a
2001. Data warehouse process management. web warehouse. In Proc. Intl. Arab Conf. Inform.
Information Systems Volume 26, Issue 3, In Technol., (ACIT’03), Alexandria, Egypt,
55

View publication stats

You might also like