0% found this document useful (0 votes)
41 views31 pages

DW Training

ETL is the process of extracting data from source systems, transforming it to fit operational needs, and loading it into a data warehouse. It involves extracting data from various sources, cleaning and transforming it using rules to handle different data formats and variations, and loading it into a data warehouse. The amount of data and complexity of transformations involved can impact the ETL architecture, which may use full extracts of all source data or periodic/incremental extracts of new or changed data. Transformations during ETL include data type conversions, classifications, field splitting/merging, aggregations, and structural or format changes to load data consistently into the data warehouse.

Uploaded by

santosh_sunkara
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views31 pages

DW Training

ETL is the process of extracting data from source systems, transforming it to fit operational needs, and loading it into a data warehouse. It involves extracting data from various sources, cleaning and transforming it using rules to handle different data formats and variations, and loading it into a data warehouse. The amount of data and complexity of transformations involved can impact the ETL architecture, which may use full extracts of all source data or periodic/incremental extracts of new or changed data. Transformations during ETL include data type conversions, classifications, field splitting/merging, aggregations, and structural or format changes to load data consistently into the data warehouse.

Uploaded by

santosh_sunkara
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

ETL- Extraction,Transformation & Loading

What is ETL?

• ETL(Extraction, Transformation and Loading) is a process by which


data is integrated and transformed from the operational systems into
the Data warehouse environment

Filters and
Extractors
Cleanser
Error
Error
Operational systems Cleaning View
Rules Check
• Rule 1 Correct
• Rule 2
• Rule 3
Transformation
Rules

• Rule 1
• Rule 2
• Rule 3

Transformation
Engine
Integrator

Error
View
Check
Correct Loader Warehouse

1-2
Operational Data - Challenges

• Data from heterogeneous sources


• Format differences
• Data Variations

• Context

– Across locations the same code could


represent different customers
– Across periods of time a product code could
have been reused

1-3
Extraction

Data
from
80 tables tables 30

Filter
Oracle
Data from 10
tables Where
Date<10/12/99
50 tables

f iles
Sybase ta f rom
Da

Target

1-4 Text files


Transformation

Source
Emp id Last First
Name Name
10001 Jones Indiana

10002 Holmes Sherlock


Staging Area

Name =
Concat(First Name,
Last Name)
Indiana Jones
Sherlock Homes

1-5
Loading

Data
Source Warehouse
Direct Load

Staging
Area g rated
i nt e
rm ed &
ra n sfo load
l ean,T data
C
Cleaning,
Transformation
& Integration of
Raw data

1-6
Volume of ETL in a Data warehouse

Source OLTP
Systems Data Marts

Metadata
e re
is h
Enterprise
or k
Data Warehouse
e w
o f th
0 %
o 8
0 t
6
•Extract •Load
•Design •Extract •Load •Replication •Access & Analysis
•Design •Scrub •Index •Replication •Access & Analysis
•Mapping •Scrub •Index •Data Set Distribution •Resource Scheduling & Distributio
•Mapping •Transform •Aggregation •Data Set Distribution •Resource Scheduling & Distributio
•Transform •Aggregation

Meta
MetaData
Data

System
SystemMonitoring
Monitoring
1-7
Factors Influencing ETL Architecture

• Volume at each warehouse component.

• The time window available for extraction.

• The extraction type (Full,Periodic etc.)

• Complexity of the processes at each stage.

1-8
Extraction Types
Extraction Types

Extraction

Periodic/
Full Extract Incremental
Extract

1-10
Full Extract

Existing data

Data Mart

Full Extract

Source System

1-11
Full Extract

New data

Data Mart

Full Extract

Source System

1-12
Incremental Extract

Existing data
Incremental
Data
Data Mart

Incremental Extract
Source System

1-14
Incremental Extract

Existing data
New data
Incremental
Data
Data Mart

Incremental Extract
Source System

Changed data

1-15
Incremental Extract

New data Incremental addition


Incremental to data mart
Data
Data Mart

Source System
Incremental Extract

Changed data
Existing data updated
using changed data

1-16
Transformation
Data Transformation

• Conversions

– Data type (e.g. Char to Date)

– Bring data to common units (Currency,Measuring Units)

• Classifications

– Changing continuous values to discrete ranges (e.g.


Temperatures to Temperature Ranges)

• Splitting of fields

• Merging of fields

• Aggregations (e.g. Sum, Avg., Count)

• Derivations (Percentages, Ratios, Indicators)

1-18
Structural Transformations

• Additive
OLTP

Orders arrive
Data ware
every Aggregate house
two minutes

OLTP
Average

Daily
Productivity Average Data ware
figures house

1-19
Format transformation

Source Target
Schema Schema
Data Type Transformation
Conversions “32” 32

Age as a String Age as an


Integer

Source Target
Splitting Schema Schema
Transformation
“15-10-1992” 15 10 1999

Day Month Year

Date as a Date as a combination of 3


String integer fields

1-20
Simple Conversions

Source Target
Schema Schema

Multiply by 1/43
Rs. 10000 $232.56

Revenue in Revenue in
Rupees Dollars

Multiply by 0.4536
1000 lbs. 453.56 kgs.

Production in Production in
Pounds Kilograms
Source Target
Schema Schema

• Transformations using Simple Conversions

1-21
Classification

Name Age
John Black 27
Richard Wayne 53
Age GroupFrequency
Jennifer Goldman 45 20-25 1
Helmut Koch 37 26-30 4
Anna Ludwig 32
Shito Maketha 28 31-35 3
Tracy Withman 39 36-40 2
Ada Zhesky 25 Grouping
41-45 2
David Rosenberg 33
Pankaj Sharma 29 46-50 1
Zhu Ling 44 51-55 1
George Kurtz 27
Rita Hartman 34
56-60 0

1-22
Data Consistency Transformations

Source 1 Source 2 Source 1


Gender Gender Gender
Male – M Male – Male Male – 1
Female – F Female – Female Female – 2

Target
Gender
Male – M
Female – F

1-23
Reconciliation of Duplicated data

Joseph J.R.Smith Joe Smith


Smith 123 Maine 123 Maine
123 Maine St. St.
St. MA - MA -
MA - 70127 70127
70127

Joseph R Smith
123 Maine St.
MA - 70127

1-24
Data Aggregation - Design Requirements

• Aggregates must be stored in their own fact tables and each level
should have its own fact table

• Dimension tables attached to the aggregate fact tables should where


ever possible be shrunken versions of the dimension tables attached to
the base fact table

• The base fact table and all of its related aggregate fact tables must be
associated together as a family of schemas

1-25
Loading
Types of Data warehouse Loading

• Target update types


– Insert
– Update

1-27
Types of Data Warehouse Updates

Data Warehouse

Source data Data Staging

 Point in Time Snapshots  Insert


 New Data  Full Replace
 Changed Data  Selective Replace
 Update
 Update plus Retain
History

1-28
New Data and Point-In-Time Data Insert

Source data

New data

OR

Point-in-Time
Snapshot New Data Added to
(e.g.. Monthly) Existing Data

1-29
Changed Data Insert

Source data
Changed Data Added to
Existing Data

Changed
data

1-30
Change of Dimension values

When the value of dimension in a data warehouse changes,


then

 History of change needs to be maintained.


 Changed data alone needs to be identified
 Changed data should be easier to access.
 Reconstruction of the dimension table any point in time should be
easier

1-31
ETL - Approach in a nutshell

1) Identify the Operational systems based on data islands in the


target
2) Map source-target dependencies.
3) Define cleaning and transformation rules
4) Validate source-target mapping
5) Consolidate Meta data for ETL
6) Draw the ETL architecture
7) Build the cleaning, transformation and auditing routines using
either a tool or customized programs

1-32

You might also like