0% found this document useful (0 votes)
15 views

Data Integration Management System For Distributed Database

Uploaded by

Saeid Masoumi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Data Integration Management System For Distributed Database

Uploaded by

Saeid Masoumi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

»In the name of

God«

Data Integration Management System


for Distributed Database
using Virtual Database Technology and MapReduce

By: Saeid Masoumi


For Advanced Database course
University of Tabriz
Fall semester–18th December
2012
Table of Contents
Introduction
Solution
Associated Studies
Database Virtualization
 Homogeneous
 Heterogeneous
 Common Schema Generation
 Query Conversion
System Framework
The MapReduce Process
Conclusion
References
Data Integration Management System for Distributed
Database 2 of 17
Introduction
Massive amounts of data
Locate and access knowledge and trends
Using data mining techniques
Valuable to support analyses and decision-
making in businesses
Ubiquitous databases: distributed and placed
anywhere
Spend much time for database selection and
data collection
instead to concentrate on the work of analysis
and rule extraction
Data Integration Management System for Distributed
Database 3 of 17
Solution
Develop a database virtualization technique
 Data analysts
 Users who apply data mining methods
 Data collection from the internet databases
 Data cleansing works
Examine XML scheme advantages
ubiquitous database schema in a unified fashion
using the XML schema
distributed database management of the same type
and of different types
location transparency feature
Develop a common schema generation method
Propose the virtual database query language
Data Integration Management System for Distributed
Database 4 of 17
Associated studies
Metadata (creation)
Irrelevant to database model (e.g. RDB or OODB)
great workload to create in initial stage
no standardized definition and manipulation language

UML and E–R model (design technique)


Irrelevant to data model(e.g. table or object)

XML scheme (mapping)


To exchange information
definitions and manipulations are well standardized
useful in a flexible fashion with various Internet
databases
Data Integration Management System for Distributed
Database 5 of 17
Database Virtualization
Databases of many kinds
 Data model differences:
 Table type of relational databases (RDB)
 XML-representation type of XML databases (XMLDB)

 Object-oriented databases (OODB)

 Vendor differences:
 Regarding RDB for example: MySQL, PgSQL, SQLServer, …

Undesired result: because of the different data models


 Time
 Labor
Virtualization of different types of modeled databases
 less of workload and cost
 facilitate management
 Provide transparency for users (database structure or
location)
Data Integration Management System for Distributed
Database 6 of 17
Database Virtualization (…)
able to use
databases of all
kinds

For virtualization of
ubiquitous
databases :
 describe the schema
information of the real
databases

Data Integration Management System for Distributed


Database 7 of 17
Virtualization of Homogeneous Distributed Databases

Method of building a virtual database


management system for RDBs provided by
different vendors

 XML conversion program


 XMLExport/Import

 RDB schema conversion into XML

 RDB data conversion into XML

Data Integration Management System for Distributed


Database 8 of 17
Homogeneous: RDB schema conversion into XML
<?xml version=”1.0” encoding”UTF-8” standalone=”yes”>
-<root>
-<rdb Name=”mysql”>
-<database Name=”questionnaire” >
-<table_structure Name=”member”>
<field Field=”samplenum” Type=”integer” Null=”FALSE” Default=” />
<field Field=”answerday” Type=”text” Null=”FALSE” Default=” />
….
</table_structure>
-<schema>
<constraint Type=”PRIMARY KEY” Table=”member” Column=”samplenum” />
….
<constraint Type=”FOREIN KEY” Table=”questionnaire” Column=”samplenum”
Retable=”member” ReColumn=”samplenum” />
….
</schema>
</database>
</rdb>
</root>

Data Integration Management System for Distributed


Database 9 of 17
Homogeneous: RDB data conversion into XML

-<root>
-<dataset dbname=”mysql”>
<data tblname=”member” samplenum=”10001” answerday=”’2007/7/6’”
answertime=”’ 13:07:19:499’” />
<data tblname=”member” samplenum=”10002” answerday=”’2007/7/6’”
answertime=”’ 13:10:33:507’” />
….
</dataset>
</root>

Data Integration Management System for Distributed


Database 10 of 17
Virtualization of Heterogeneous Distributed Databases
virtualization of modeled DBs of different
types
describe the schema information of each
model using a single common schema (XML
Schema)

Data Integration Management System for Distributed


Database 11 of 17
Heterogeneous: SQL and associated XML
SQL XML
Table CREATE TABLE <xsd: element
Any XMLDB is definition table name… name=”table name”…

already described Column


definition
CREATE TABLE…
column name...
<xsd: element
name=”column
name”…
in the XML Data type CREATE TABLE… <xsd: element…
format (without definition data type.. type=”data type”…

conversion) Default values


CREATE TABLE…
column name DEFAULT
value
<xsd: element…
default=”value”…

Primary key
PRIMARY KEY <xsd: key…
constraint
An OODB is Unique
UNIQUE <xsd:unique …
constraint
fundamentally an Foreign key
FOREIGN KEY
<xsd: keyref …
constraint refer =…
extension of RDB NOT NULL NOT NULL
<xsd:…
nillable=”false”...

Method CREATE METHOD


CREATE TABLE…
Inheritance UNDER upper level <xsd: complexType …
table name

Data Integration Management System for Distributed


Database 12 of 17
Common Schema Generation
CREATEDB EmployeeDB; <!-- … -->
<xs:element name="AffiliationTable" type="AffiliationTable Type"/>
<xs:element name="EmployeeTable" type="EmployeeTable Type"/>
CREATE TABLE EmployeeTable ( </xs:sequence>
EmployeeID int PRIMARY KEY, </xs:complexType>
Name varchar(50) NOT NULL, <xs:complexType name="AffiliationTable Type">
Salary int CHECK(0 < Salary), <xs:annotation>
<xs:appinfo>
AffiliationID int REFERENCES <r:index index-key="AffiliationID" primary="yes"/>
AffiliationTable(AffiliationID) <r:index index-key="Affiliation" unique="yes"/>
ON UPDATE CASCADE </xs:appinfo>
ON DELETE CASCADE <!-- … -->
<xs:complexType>
); <xs:sequence>
<xs:element minOccurs="1" name="AffiliationID“ r:nullable="false“
CREATE TABLE AffiliationTable ( …
<xs:element minOccurs="0" name="Affiliation“ r:sqltype="varchar“
AffiliationID int Primary Key,

Affiliation varchar(5) UNIQUE <!-- … -->
); <xs:complexType name="EmployeeTable Type">
<xs:annotation>
<xs:appinfo>
Example of SQL/CREATE <r:index index-key="EmploeyeID" primary="yes"/>
<r:check check-column="Salary" rule="0<Salary“/>
  <r:fkey fkey-column="AffiliationID" refcolumn="AffiliationID“
xample of the common schema ref-table="Affiliation” …
Data Integration Management System for Distributed
Database 13 of 17
Query Conversion
development of the query language to access
the virtual databases
extension of the existing XQuery
from the XQuery language into SQL language
or XQuery language or …
Sample of the virtual database query

for $employee in common-schema()/DB1/sample_db1/employee


for $manager in common-schema()/DB2/sample_db2/affiliation.xml
where $employee/EmplyeID= $manager/Affiliation/
return $employee/Name

Data Integration Management System for Distributed


Database 14 of 17
System Framework Client Client Client

Parse Engine Parse Engine

Session Session
Manager Manager

Info Center Session Session


Manager Manager
Virtual Database manager

Session Session
Manager Manager Common Schema

Query
Conversion Schema
Module Conversion
Module

Execute Engine Execute Engine


Archive
Worker
Worker Worker

Execute Execute
Manager Manager

Worker Worker

Resource Delegate
Resource Delegate Resource Delegate

RDB XML OODB

Data
Data Data

Schema
Schema Schema

Data Integration Management System for Distributed


Database 15 of 17
The MapReduce Process

Data Integration Management System for Distributed


Database 16 of 17
Conclusion
Developed the common schema conversion program
for RDB schema into XML schema
Showed the schema constraints
(such as PRIMARY KEY, CHECK, NOT NULL, UPDATE
CASCADE ON DELETE, UNIQUE) can be converted.
Future research
Develop the integration program of XML DB schema
into the common schema
Implement the common data manipulation API
 (for example, extension of the existing XQuery modules)
to access the virtual databases
Incorporate location transparency functions to this API

Data Integration Management System for Distributed


Database 17 of 17
1)
References
S. Abiteboul, P. Buneman, and D. Suciu, Data on the Web:From Relations to Semistructured Data and
XML, Morgan Kaufmann Series in Data Management Systems (1999).
2) S. Amer-Yahia, F. Du, and J. Freire, “A comprehensive solution to the XML-to-relational mapping
problem,” Proc.6th Annual ACM International Workshop on Web Information and Data Management,
pp.31-38 (2004).
3) I. Varlamis and M. Vazirgiannis, “Bridging XML-schema and relational databases: a system for generating
and manipulating relational databases using valid XML documents,” Proc. 2001 ACM Symposium on
Document engineering, pp.105-114 (2001).
4) P. Bohannon, J. Freire, P. Roy, and J. Simeon, “From XML Schema to Relations: A Cost-Based Approach
to XML Storage,” Proc. 18th International Conference on Data Engineering, pp. 64-75 (2002).
5) G. Kappel, E. Kapsammer, and W. Retschitzegger, “Integrating XML and Relational Database Systems,”
World Wide Web: Internet and Web Information Systems, 7, pp. 343-384 (2004).
6) R. Li, Z. Lu, W. Xiao, B. Li, and W. Wu, “Schema Mapping for Interoperability in XML-Based
Multidatabase Systems,”Proc. 14th International Workshop on Database and Expert Systems Applications,
(2003).
7) Y. Wada, Y. Watanabe, J. Sawamoto, and T. Katoh,”Database Virtualization Technology in Ubiquitous
Computing,” Proc. 6th Innovations in Information Technology (Innovations’09), pp.170-174 (2009-12).
8) Y. Wada, Y. Watanabe, K. Syoubu, J. Sawamoto, and T.Katoh, ”Virtualization Technology for Ubiquitous
Databases,” Proc. 4th Workshop on Engineering Complex Distributed Systems (ECDS 2010) (2010-02)(to
be appeared)
Thanks for attention

Any Question

You might also like