0% found this document useful (0 votes)
23 views

$RIDDYAE

The document discusses data warehouse development, extraction, transformation, and loading (ETL) processes, and metadata repositories. It describes how data warehouses should be developed incrementally, defines the functions of ETL including extraction, cleaning, transformation and loading, and explains that metadata repositories should contain information about the data warehouse structure, operational metadata, algorithms, mappings from source systems, performance data, and business terms.

Uploaded by

muzammil sardar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

$RIDDYAE

The document discusses data warehouse development, extraction, transformation, and loading (ETL) processes, and metadata repositories. It describes how data warehouses should be developed incrementally, defines the functions of ETL including extraction, cleaning, transformation and loading, and explains that metadata repositories should contain information about the data warehouse structure, operational metadata, algorithms, mappings from source systems, performance data, and business terms.

Uploaded by

muzammil sardar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Data Warehouse &

Week 4 -- Lecture 12
Data Mining
Course Teacher: Syed Saood Zia
Data Warehouse & Data Mining

Data Warehousing and Online Analytical Processing


Table of Content 3

• Data Warehouse Development

• Extraction, Transformation, and Loading

• Metadata Repository
Data Warehouse Development 4

• A recommended method for the development of data warehouse


systems is to implement the warehouse in an incremental and
evolutionary manner, as shown in Figure 4.2.
• First, a high-level corporate data model is defined within a
reasonably short period (such as one or two months) that provides
a corporate-wide, consistent, integrated view of data among
different subjects and potential usages.
Data Warehouse Development 5
Data Warehouse Development 6

• This high-level model, refined in the development of enterprise


data warehouses and departmental data marts, will greatly reduce
future integration problems.

• Second, independent data marts can be implemented in parallel


with the enterprise warehouse based on the same corporate data
model set.
Data Warehouse Development 7

• Third, distributed data marts can be constructed to integrate


different data marts via hub servers.

• Finally, a multitier data warehouse is constructed where the


enterprise warehouse is the sole custodian of all warehouse data,
which is then distributed to the various dependent data marts.
Extraction, Transformation, and Loading 8

• Data warehouse systems use back-end tools and utilities to populate


and refresh their data (Figure 4.1). These tools and utilities include the
following functions:

• Data extraction, which typically gathers data from multiple,


heterogeneous, and external sources.

• Data cleaning, which detects errors in the data and rectifies them
when possible.
Extraction, Transformation, and Loading 9

• Data transformation, which converts data from legacy or host format


to warehouse format.

• Load, which sorts, summarizes, consolidates, computes views, checks


integrity, and builds indices and partitions.

• Refresh, which propagates the updates from the data sources to the
warehouse.
Metadata Repository 10

• Metadata are data about data.


• When used in a data warehouse, metadata are the data that
define warehouse objects.
• Metadata are created for the data names and definitions of the
given warehouse.
• Additional metadata are created and captured for time stamping
any extracted data, the source of the extracted data, and missing
fields that have been added by data cleaning or integration
processes.
Metadata Repository 11

• A metadata repository should contain the following:

• A description of the data warehouse structure, which includes the


warehouse schema, view, dimensions, hierarchies, and derived data
definitions, as well as data mart locations and contents.
• Operational metadata, which include data lineage (history of migrated
data and the sequence of transformations applied to it), currency of data
(active, archived, or purged), and monitoring information (warehouse
usage statistics, error reports, and audit trails).
Metadata Repository 12

• A metadata repository should contain the following:


• The algorithms used for summarization, which include measure and
dimension definition algorithms, data on granularity, partitions, subject
areas, aggregation, summarization, and predefined queries and reports.
• Mapping from the operational environment to the data warehouse,
which includes source databases and their contents, gateway
descriptions, data partitions, data extraction, cleaning,
transformation rules and defaults, data refresh and purging rules,
and security (user authorization and access control).
Metadata Repository 13

• A metadata repository should contain the following:

• Data related to system performance, which include indices and profiles


that improve data access and retrieval performance, in addition to rules
for the timing and scheduling of refresh, update, and replication cycles.
• Business metadata, which include business terms and definitions, data
ownership information, and charging policies.
Summary 14

• Data Warehouse Development

• Extraction, Transformation, and Loading

• Metadata Repository

You might also like