0% found this document useful (0 votes)

120 views88 pages

Data Warehousing & Data Mining Chapter 2

This document discusses the origins and purpose of data warehousing. It begins by reviewing the evolution of database technology from the 1960s to today. It then defines what a data warehouse is, explaining that it consolidates data from multiple sources for analysis to support decision making. Key aspects of a data warehouse are that it is subject-oriented, integrated, time-variant, and non-updatable. The document contrasts data warehouses with operational databases and describes how extraction, transformation, and loading (ETL) are used to integrate data from different sources into the data warehouse.

Uploaded by

alia triki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views88 pages

Data Warehousing & Data Mining Chapter 2

Uploaded by

alia triki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 88

Chapter 2:

Data Warehousing

Part I

TBS 2020-2021

Olfa Dridi & Afef Ben Brahim 1

Evolution of Database Technology
1960s :
• Data collection, database creation, IMS and
network DBMS
1970s :
• Relational data model, relational DBMS
implementation
1980s:
• RDBMS, advanced data models and
application-oriented DBMS
1990s—2000s:
• Data mining, data warehousing, multimedia
databases and web databases 2
Origins of DW
▪ Database developers understood that their
software was required for both transactional and
analytical processing.
▪ However, operational and analytical data are
separate with different requirements and different
user communities.
▪ Once these differences were understood, new data
bases were created specifically for analysis use.

3
Origins of DW
▪ Operational processing (transactional processing)
captures, stores and manipulates data to support
daily operations.
▪ Information processing is the analysis of data or
other forms of information to support decision
making.
▪ Data warehouse can consolidate and integrate
information from many internal and external
sources and arrange it in a meaningful format for
making business decisions.

4
Origins of DW
Example:
Thinking for instance in mine. If a
mine is not organized and built
properly, miner cannot access to
place they need to mine the mine
effectively.
In same manner, the warehousing
must be set up to better serve the
needs of those analyzing the data.

5
What is a Data Warehouse ?
▪ According to Inmon’s (father of data warehousing) :
It is a collection of integrated, subject-oriented,
databases designed to support the DSS function,
where each unit of data is non-volatile and relevant
to some moment in time.
▪ Or a DW is : A subject-oriented, integrated, time-
variant, non-updatable collection of data used in
support of management decision-making processes:
▪ Subject-oriented: e.g. customers, patients, products
▪ Integrated: Consistent naming conventions, formats,
encoding structures; from multiple data sources
▪ Time-variant: Can study trends and changes
▪ Non-updatable: Read-only, periodically refreshed 6
What is a Data Warehouse ?
▪ DW-Subject-oriented
▪ Organized around major subjects, such as
customer, product, sales, student, patient.
▪ Focusing on the modeling and analysis of data
for decision makers, not on daily operations or
transaction processing.
▪ Providing a simple and concise view around
particular subject issues by excluding data that
are not useful in the decision support process.

7
What is a Data Warehouse ?
▪ DW-Subject-oriented

8
What is a Data Warehouse ?
▪ DW-Integrated
▪ Focusing on the modeling and analysis of data for
decision makers: data cleaning and data integration
techniques are applied.

9
What is a Data Warehouse ?
▪ DW- Time Variant
▪ The time horizon for the DW is significantly
longer than that of operational systems.
▪ Data warehouse data: provide information
from a historical perspective (e.g., past 5-10
years)
▪ Every key structure in the data warehouse:
▪ Contains an element of time, explicitly or
implicitly. But the key of operational data
may or may not contain “time element”.

10
What is a Data Warehouse ?
▪ DW-Non Updatable / Non-volatility
▪ Typical activities such as deletes, inserts, and
changes that are performed in an operational
application environment are completely
nonexistent in a DW environment.
▪ Only two data operations are ever performed
in the DW: data loading and data access.

11
Need for Data Warehousing

▪ A DW allows its users to extract required data, for

business analysis and strategic decision making
▪ DW is a process, not a product
▪ DW is an architecture : is the way of organizing data

12
Database, Data warehouse and
Data set
▪ DB : contains tables, rows refer to records and
columns to fields. Most DBs are relational DBs
(relating tables to reduce redundancy & improve
DB performance via the normalization process)
▪ DW : is a type of DB that has been denormalized
& archived.
▪ Denormalization is the process of combining
some tables into a single table. This may
introduce duplicate data, but will reduce the
number of joins a query has to process.
▪ Data set : is a sub-set of a DW or a DB. It is usually
denormalized so that only one table is used. 13
How Do Data Warehouses Differ
From Operational Systems?
▪ Goals
▪ Structure
▪ Size
▪ Performance optimization
▪ Technologies used

14
Need to separate operational and
information systems
Three primary factors:
▪ A data warehouse centralizes data that are
scattered throughout disparate operational
systems and makes them available for DM.
▪ A well-designed data warehouse adds value to
data by improving their quality and consistency.
▪ A separate data warehouse eliminates much of the
contention for resources that results when
information applications are mixed with
operational processing.

15
Comparison of Database Types

Data warehouse Operational system

Subject oriented Transaction oriented
Large (hundreds of GB up to Small (MB up to several GB)
several TB)
Historical data Current data
Denormalized table structure Normalized table structure
(few tables, many columns per (many tables, few columns per
table) table)
Batch updates Continuous updates
Usually very complex queries Simple to complex queries

16
From the Data Warehouse to Data
Marts
▪ A data mart contains only those data that are
specific to a particular group. For example, the
marketing data mart may contain only data
related to items, customers, and sales.
▪ Data marts are confined to subjects.
▪ Data marts are small in size.
▪ Data marts are customized by department.

17
How Data Warehousing works

18
How Data Warehousing works
▪ Data is loaded and periodically
updated via
Data Warehouse
Extract/Transform/Load (ETL)
tools. ETL pipeline

▪ There are two types of data that outputs

can be used in Data Mining:

▪ Operational data: come from
ETL
transactional system. This
information can be combined in a
ETL ETL ETL ETL
DW or extracted into a data set.
BUT this data is too detailed and
the detail may compromise
individuals’ privacy. RDBMS RDBMS
HTML
▪ Organizational data : create XML

summary of data to protect

peoples’ privacy while providing
19
useful data to data miners
How Data Warehousing works
ETL tools
▪ Problems
• Data from different sources
• Data with different formats
• Handling of missing data and erroneous data
• Query performance of DW
Data Integration is Hard
▪ Data must be translated into a consistent format
▪ Data integration represents ~80% of effort for a typical data
warehouse project!
▪ Some reasons why it’s hard:
Metadata is poor or non-existent, data quality is often bad,
missing or default values, multiple spellings of the same thing,
Inconsistent semantics, etc. 20
How Data Warehousing works
Data Integration is Hard
▪ Same person, different spellings
– Agarwal, Agrawal, Aggarwal, etc.
▪ Multiple ways to denote company name
– Persistent Systems, PSPL, Persistent Pvt. LTD.
▪ Use of different town names
– Mumbai, Bombay
▪ Different account numbers generated by different
applications for the same customer
▪ Required fields left blank
▪ Invalid product codes collected at points of sale
– manual entry leads to mistakes
– “in case of a problem use 9999999”
21
How Data Warehousing works
Data Integration Across Sources: Example

Savings Loans Trust Credit card

Same data Different data Data found here Different keys

different names Same name nowhere else same data

22
How Data Warehousing works
Extraction Transformation Loading–ETL
▪ To get data out of the source and load it into the data
warehouse – simply a process of copying data from
one database to another.
▪ Data is extracted from an OLTP database, transformed
to match the data warehouse schema and loaded into
the data warehouse database.
▪ Many data warehouses also incorporate data from
non‐OLTP systems such as text files, legacy systems,
and spreadsheets; such data also requires extraction,
transformation, and loading.
▪ When defining ETL for a data warehouse, it is
important to think of ETL as a process, not a physical
23
implementation.
How Data Warehousing works
Extraction Transformation Loading–ETL tools
Extract Transform Load
& Clean

Sources DSA DW

DSA: A Data Staging Area is a temporary location where data from

24
source systems are copied.
Data Preprocessing
▪ Data in the real world is dirty:
– incomplete: lacking attribute values, lacking certain
attributes of interest, or containing only aggregate data
– noisy: containing errors or outliers
– inconsistent: containing discrepancies in codes or
names
▪ May contain : obsolete fields, missing values,
outliers, etc.
▪ No qualitative data, no qualitative mining results!
– Qualitative decisions must be based on qualitative data
– A Data warehouse needs consistent integration of
qualitative data
25
Major Tasks in Data Preprocessing
▪ Data cleaning
– Fill in missing values, smooth noisy data, identify or
remove outliers, and resolve inconsistencies
▪ Data integration
– Integration of multiple databases, data cubes, or files
▪ Data transformation
– Denormalization and aggregation (should be aware about
the privacy and protection of people against crimes such
as identify theft)
▪ Data discretization
– Part of data reduction but with particular importance,
especially for numerical data
26
ETL Tools
Extraction
Goal: to identify the correct subset of source data to be
submitted to the ETL for processing

▪ Takes place at idle times of the source system (typically

at night)
▪ Extracting the data from source systems (SAP, ERP,
other operational systems)
▪ Data from different source systems are converted into
one consolidated data warehouse format which is
ready for transformation processing.
▪ Extract relevant data

27
ETL Tools
Data Transformation
▪ Extracted data is raw data and it cannot be applied to the DW
▪ Major effort within data transformation is the improvement
of data quality
▪ Data warehouses can fail if appropriate data transformation
strategy is not developed
▪ Transformation process involve: Applying business rules (so-
called derivations, e.g., calculating new measures and
dimensions),
▪ Cleaning (e.g., mapping NULL to 0 or "Male" to "M“ , etc.)
▪ Filtering (e.g., selecting only certain columns to load),
▪ Splitting a column into multiple columns and vice versa,
▪ Joining together data from multiple sources (e.g., lookup,
merge) 28
ETL Tools
Loading
▪ Loading the data into a data warehouse
▪ Terminology:
Initial Load : populating all the data warehouse
tables for the first time
Incremental Load : applying ongoing changes as
necessary in a periodic manner
Full Refresh : completely erasing the contents of
one or more tables and reloading with fresh
data (initial load is a refresh of all the tables)

29
ETL Tools
Data Loading
▪ Load,
▪ Append,
▪ Destructive merge,
▪ Constructive merge.

30
ETL Tools
Load
▪ If the target table to be loaded already exists and
data exists in the table, the load process wipes
out the existing data and applies the data from
the incoming file.
▪ If the table is already empty before loading, the
load process simply applies the data from the
incoming file.

31
ETL Tools
Load

32
ETL Tools
Append
▪ Extension of the load.
▪ If data already exists in the table, the append
process unconditionally adds the incoming data,
preserving the existing data in the target table.
▪ When an incoming record is a duplicate of an
already existing record, you may define how to
handle an incoming duplicate:
▪ The incoming record may be allowed to be
added as a duplicate.
▪ In the other option, the incoming duplicate
record may be rejected during the append
process.
33
ETL Tools
Append

34
ETL Tools
Destructive Merge
▪ Applies incoming data to the target data.
▪ If the primary key of an incoming record matches
with the key of an existing record, update the
matching target record.
▪ If the incoming record is a new record without a
match with any existing record, add the incoming
record to the target table.

35
ETL Tools
Destructive Merge

36
ETL Tools
Constructive Merge
▪ It is slightly different from the destructive merge.
▪ If the primary key of an incoming record matches
with the key of an existing record, leave the
existing record, add the incoming record, and
mark the added record as superseding the old
record.

37
ETL Tools
Constructive Merge

38
Refresh
▪ Propagate updates on source data to the
warehouse
▪ Issues:
– when to refresh
– how to refresh -- incremental refresh techniques

39
When to Refresh?
▪ Periodically (e.g., every night, every week) or
after significant events
▪ On every update: not warranted unless
warehouse data require current data (up to
the minute stock quotes)
▪ Refresh policy set by administrator based on
user needs and traffic
▪ Possibly different policies for different sources

40
Refresh techniques
▪ Incremental techniques
– Detect changes on base tables: replication
servers (e.g., Sybase, Oracle, IBM Data
Propagator)
• snapshots (Oracle)
• transaction shipping (Sybase)
– Compute changes to derived and summary
tables
– Maintain transactional correctness for
incremental load

41
How To Detect Changes
▪ Create a snapshot log table to record ids of
updated rows of source data and timestamp
▪ Detect changes by:
– Defining after row triggers to update snapshot
log when source table changes
– Using regular transaction log to detect changes
to source data

42
Chapter 2:
Data Warehouse
conceptual modeling

Part II
TBS 2020-2021

Olfa Dridi & Afef Ben Brahim 43

Example of analytical questions: Historical
analysis of the number of passengers
▪ How many passengers are frequent flyers?
▪ How much time do passengers spend on average in the different
zones?
▪ How much time do passengers spend on average before entering
different zones? How is the distribution of time spent per
passenger in a given zone?
▪ Which day of week are the zones used the most?
▪ When is there a risk of bottlenecks in specific zones?

44
ER Model vs. Multidimensional
Model
▪ Why don’t we use the entity-relationship (ER)
model in data warehousing?
▪ ER model: a data model for general purposes
– All types of data are equal, difficult to identify the
data that is:
• important for business analysis
• No difference between: What is important ? What
just describes the important?
• Normalized databases (many details that can affect
privacy and security)
– Hard to overview a large ER diagram (e.g., over 100
entities/relations for an enterprise)
45
ER Model vs. Multidimensional
Model
▪ Traditional DBs generally deal with two-dimensional
data. However, querying performance in a multi-
dimensional data storage model is more efficient.
▪ More built in “meaning”
– What is important
– What describes the important
– What we want to optimize
▪ Recognized by OLAP/BI tools : Tools that offer powerful
query facilities based on Multi-Dimensional (MD) design

46
Multidimensional Model
▪ Data is divided into: Facts and Dimensions
▪ A fact is the important entity: exp a sale
▪ Facts have measures that can be aggregated: sales
price
▪ Dimensions describe facts
▪ Facts “live” in a MD cube
▪ Goal for dimensional modeling:
– Surround facts with as much context (dimensions) as
possible
– Hint: redundancy may be ok (in well-chosen places)
– But you should not try to model all relationships in the
data (unlike E/R and OO modeling!) 47
Dimension
▪ Dimensions are the core of MD databases
▪ Dimensions are used for
▪ Selection of data
▪ Grouping of data at the right level of detail
▪ Dimensions consist of dimension values
▪ Product dimension has values ”milk”, ”cream”, …
▪ Time dimension has values ”1/1/2001”, ”2/1/2001”,…
▪ Dimension values may have an ordering
▪ Used for comparing cube data across values
▪ Especially used for Time dimension

48
Dimension
▪ Dimensions have hierarchies with levels
▪ Typically 3-5 levels (of detail)
▪ Dimension values are organized in a tree structure
▪ Product: Product->Type->Category
▪ Store: Store->Area->City->County
▪ Time: Day->Month->Quarter->Year
▪ Dimensions have a bottom level and a top level
▪ Levels may have attributes
▪ Simple, non-hierarchical information
▪ Day has Workday as attribute
▪ Dimensions should contain much information
▪ Time dimension may contain holiday, season, events,…
▪ Good dimensions have 50-100 or more attributes/levels 49
Facts
▪ Facts represent the subject of the desired analysis
• The important in the business that should be
analyzed
▪ A fact is identified via its dimension values
• A fact is a non-empty cell
▪ Generally, a fact should:
• Be attached to exactly one dimension value in
each dimension
• Only be attached to dimension values in the
bottom levels

50
Measures
▪ Measures represent the fact property that the
users want to study and optimize
▪ Example: total sales price
▪ A measure has two components
▪ Numerical value: (exp: sales price)
▪ Aggregation formula (exp: SUM): used for
aggregating/combining a number of measure values
into one

51
Multidimensional Model
Example: sales of supermarkets
• Facts and measures
– Each sales record is a fact, and its sales value is a
measure
• Dimensions
– Group correlated attributes into the same
dimension
– Each sales record is associated with its values of
Product, store, Time

52
Granularity: Dimensionality Hierarchy
▪ Granularity of facts is important
▪ Level of detail
▪ Given by combination of bottom levels
▪ A dimensional hierarchy defines mappings from a set of
lower-level concepts to higher level concepts.
Country
Year

2D data
Region Season
Quarter

City
Month Week
Area

53
ZipCode Day
Schema Design
▪ A schema is a logical description of the entire
database.
▪ Much like a database, a data warehouse also
requires to maintain a schema.
▪ A database uses relational model, while a data
warehouse uses Star, Snowflake, and Fact
Constellation schema.

54
Star schema
▪ A star schema consists of two types of tables:
• fact table
• dimension tables
▪ Each dimension in a star schema is represented
with only one-dimension table.
▪ This dimension table contains the set of
attributes.

55
Star schema: Components

time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold state_or_province
country
avg_sales
Measures
56
Snowflake schema
▪ Snowflake schema is an expanded version of a
star schema in which dimension tables are
normalized into several related tables.
▪ Advantages
• Small saving in storage space
• Normalized structures are easier to update and
maintain
▪ Disadvantages
• A schema that is less intuitive
• The ability to browse through the content is difficult
• A degraded query performance because of additional
joins.
57
Snowflake schema : Example

time
item
time_key
day item_key supplier
day_of_the_week Sales Fact Table item_name
supplier_key
month brand
time_key supplier_type
quarter type
year item_key supplier_key

branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province_or_street
Measures country
58
Fact Constellation Schema
▪ A fact constellation has multiple fact tables. It is
also known as galaxy schema.
▪ The following diagram shows two fact tables,
namely sales and shipping.

59
Fact Constellation Schema
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location

branch location_key location to_location

branch_key location_key dollars_cost
branch_name
units_sold
street
branch_type dollars_sold city units_shipped
province_or_state
avg_sales country shipper
Measures shipper_key
shipper_name
location_key
shipper_type 60 60
Multidimensional Model: Data Cubes
▪ A data warehouse is based on a MD data model
which views data in the form of a data cube.
▪ A data cube, such as sales, allows data to be
modeled and viewed in multiple dimensions:
• Dimension tables, such as item(item_name, brand,
type), or time(day, week, month, quarter, year)
• Fact table that contains measures (such as
quantity_sold) and keys to each of the related
dimension tables

61
Multidimensional Model: Data Cubes
Data Cube
▪ Useful data analysis tool in DW
▪ Generalized GROUP BY queries
▪ Aggregate facts based on chosen dimensions
– Product, store, time dimensions
– Sales measures of Sales fact
Why data cube?
▪ Good for visualization (i.e., text results hard to
understand)
▪ MD, intuitive
▪ Support interactive OLAP operations
62
Multidimensional Model: Data Cubes
▪ Sales volume as a function of product, month, and
region
Dimensions: Product, Location, Time
Hierarchical summarization paths

Industry Region Year

Category Country Quarter

Product

Product City Month Week

Office Day

Month

63
A Sample of a Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
TV
PC U.S.A
VCR

Country
sum
Canada

Mexico

sum

64
On-Line Analytical Processing (OLAP)
▪ Original definition : The dynamic synthesis,
analysis, and consolidation of large volumes of
multi-dimensional data, [Codd, 1993].

▪ Describes a technology that is designed to

optimize the storing and querying of large
volumes of multi-dimensional data that is
aggregated (summarized) to various levels of
detail to support the analysis of this data.

65
On-Line Analytical Processing (OLAP)
▪ The analytical operations that can be performed
on data cubes include:
– Roll-up – Split
– Drill-down – Nest
– Slice and Dice – Select
– Pivot/rotate – Projection
– Switch

66
On-Line Analytical Processing (OLAP)
▪ Roll-up performs aggregation on a data cube in
any of the following ways:
– By climbing up a concept hierarchy for a
dimension
– By dimension reduction
The following diagram illustrates how roll-up works.

67
On-Line Analytical Processing (OLAP)

Roll-up

68
On-Line Analytical Processing (OLAP)
▪ Roll-up is performed by climbing up a
concept hierarchy for the dimension
location.
▪ Initially the concept hierarchy was:
"street < city < province < country".
▪ On rolling up, the data is aggregated by
ascending the location hierarchy from the
level of city to the level of country.
▪ The data is grouped into cities rather than
countries.
▪ When roll-up is performed, one or more
dimensions from the data cube are
removed. 69
On-Line Analytical Processing (OLAP)
▪ Drill-down is the reverse of roll-up and involves
revealing the detailed data that forms the
aggregated data. Drill-down can be performed by
moving down the dimensional hierarchy or by
dimensional introduction e.g. 3-D sales data to 4-
D sales data.

70
On-Line Analytical Processing (OLAP)

Drill-down
71
On-Line Analytical Processing (OLAP)
▪ Drill-down is performed by stepping down a concept
hierarchy for the dimension time.

▪ Initially the concept hierarchy was "day < month <

quarter < year.“

▪ On drilling down, the time dimension is descended

from the level of quarter to the level of month.

▪ When drill-down is performed, one or more

dimensions from the data cube are added.

▪ It navigates the data from less detailed data to highly

detailed data.
72
On-Line Analytical Processing (OLAP)

73
On-Line Analytical Processing (OLAP)
▪ Slice - ability to look at data from different
viewpoints. The slice operation performs a
selection on one dimension of the data whereas
dice uses two or more dimensions. For example a
slice of sales revenue (type = ‘Flat’) and a dice
(type = ‘Flat’ and time = ‘Q1’).

74
On-Line Analytical Processing (OLAP)

Slice

▪ Here, Slice is performed for the dimension "time" using the

criterion time = "Q1".
▪ It will form a new sub-cube by selecting one or more 75
dimensions.
On-Line Analytical Processing (OLAP)
▪ Dice : Dice selects two or more dimensions from
a given cube and provides a new sub-cube.
Consider the following diagram that shows the
dice operation

76
On-Line Analytical Processing (OLAP)

Dice 77
On-Line Analytical Processing (OLAP)
The dice operation on the cube based on the
following selection criteria involves three
dimensions:
▪ (location = "Toronto" or "Vancouver")
▪ (time = "Q1" or "Q2")
▪ (item =" Mobile" or "Modem")

78
On-Line Analytical Processing (OLAP)

79
On-Line Analytical Processing (OLAP)
▪ Pivot - ability to rotate the data to provide an
alternative view of the same data e.g. sales
revenue data displayed using the location (city)
as x-axis against time (quarter) as the y-axis can
be rotated so that time (quarter) is the x-axis
against location (city) is the y-axis.

80
On-Line Analytical Processing (OLAP)

Pivot 81
On-Line Analytical Processing (OLAP)

Switch
82
On-Line Analytical Processing (OLAP)

Split 83
On-Line Analytical Processing (OLAP)

Nest (pièce, région)

Nest 84
On-Line Analytical Processing (OLAP)

Selection 85
On-Line Analytical Processing (OLAP)

Projection 86
On-Line Analytical Processing (OLAP)
▪ OLAP is the use of a set of graphical tools that provide
users with MD views of their data and allows them to
analyze the data using simple windowing techniques.
▪ Relational OLAP (ROLAP)
▪ OLAP tools that view the database as a traditional
relational database in either a star schema or other
normalized or demoralized set of tables
▪ Multidimensional OLAP (MOLAP)
▪ OLAP tools that load data into an intermediate
structure, usually a three or higher dimensional array.
(Cube structure)
▪ Hybrid OLAP (HOLAP)
▪ Combination of ROLAP and MOLAP tools

87
The Complete Decision Support
System

Information Data Warehouse OLAP Servers Clients

Sources Server (Tier 2) (Tier 3)
(Tier 1) e.g., MOLAP
OLAP
Semi-structured
Sources
serve

extract Query/Reporting
transform Data
Warehouse serve
load
refresh e.g., ROLAP
.
Data Mining
Operational serve
DB’s

Data Marts
88

THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
GROWATT Commercial Energy Storage System Introduction 20230412
No ratings yet
GROWATT Commercial Energy Storage System Introduction 20230412
53 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Axioma Lora Payload W1 V01.1 USA
No ratings yet
Axioma Lora Payload W1 V01.1 USA
13 pages
DMDW Merged SaumyaRanjan Dash
No ratings yet
DMDW Merged SaumyaRanjan Dash
535 pages
1
No ratings yet
1
134 pages
Unit-I DW - Architecture
100% (1)
Unit-I DW - Architecture
96 pages
Data Warehouse Concepts Presentation
100% (2)
Data Warehouse Concepts Presentation
60 pages
Data Warehouse
No ratings yet
Data Warehouse
97 pages
Understanding Data Warehouse
No ratings yet
Understanding Data Warehouse
24 pages
DWM Lecture 1
No ratings yet
DWM Lecture 1
33 pages
Class15 - Data Warehousing
No ratings yet
Class15 - Data Warehousing
76 pages
BIDW Concepts
100% (1)
BIDW Concepts
56 pages
Lecture # 1-2-Intro
No ratings yet
Lecture # 1-2-Intro
55 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
Business Intelligence: Lecture # 1
No ratings yet
Business Intelligence: Lecture # 1
30 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
168 pages
Lec1 - Introduction To DWH
No ratings yet
Lec1 - Introduction To DWH
41 pages
Adbms Data Warehousing and Data Mining
No ratings yet
Adbms Data Warehousing and Data Mining
169 pages
CH 1
No ratings yet
CH 1
53 pages
Chap3_PIEAS_DCIS_BSCIS_DM_23_Topic_03_DWH_OLAP
No ratings yet
Chap3_PIEAS_DCIS_BSCIS_DM_23_Topic_03_DWH_OLAP
46 pages
4-Data Warehousing and Integration in Business
No ratings yet
4-Data Warehousing and Integration in Business
39 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
74 pages
DWH Introduction
No ratings yet
DWH Introduction
18 pages
notes download ba
No ratings yet
notes download ba
104 pages
Data Warehouse
No ratings yet
Data Warehouse
39 pages
DW RL Compre With Mod 11
No ratings yet
DW RL Compre With Mod 11
750 pages
Lecture 1 Introduction To Data Warehousing
No ratings yet
Lecture 1 Introduction To Data Warehousing
41 pages
Data Warehouse
No ratings yet
Data Warehouse
169 pages
Data Warehousing AND Data Mining: S. Sudarshan Krithi Ramamritham
No ratings yet
Data Warehousing AND Data Mining: S. Sudarshan Krithi Ramamritham
169 pages
Data Warehousing AND Data Mining: S. Sudarshan Krithi Ramamritham
No ratings yet
Data Warehousing AND Data Mining: S. Sudarshan Krithi Ramamritham
169 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
87 pages
Data Warehousing and On-Line Analytical Processing
No ratings yet
Data Warehousing and On-Line Analytical Processing
40 pages
2024 Meeting 1 - Data Warehouse Fundamentals
No ratings yet
2024 Meeting 1 - Data Warehouse Fundamentals
47 pages
Chap 2 - Data Warehousing Part I (2)
No ratings yet
Chap 2 - Data Warehousing Part I (2)
31 pages
DM Part 2
No ratings yet
DM Part 2
24 pages
Lecture 13 - Data Warehousing
No ratings yet
Lecture 13 - Data Warehousing
27 pages
Introduction To Business Intelligence and Data Analysis
No ratings yet
Introduction To Business Intelligence and Data Analysis
40 pages
Lecture DW 021
No ratings yet
Lecture DW 021
195 pages
BA unit2 own
No ratings yet
BA unit2 own
10 pages
Lecture 7-Data Warehousing-Data Mining
No ratings yet
Lecture 7-Data Warehousing-Data Mining
68 pages
Unit 1
No ratings yet
Unit 1
34 pages
Data Warehousing
No ratings yet
Data Warehousing
24 pages
In T e G R A Ti o N: Integration of Data
No ratings yet
In T e G R A Ti o N: Integration of Data
21 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
41 pages
Lecture 1
No ratings yet
Lecture 1
43 pages
Session Five - Data Integration
No ratings yet
Session Five - Data Integration
11 pages
Data War Eh Puse
No ratings yet
Data War Eh Puse
51 pages
DMDW Honeymoon Pack
No ratings yet
DMDW Honeymoon Pack
473 pages
BI unit 1 Data warehouse.ppt
No ratings yet
BI unit 1 Data warehouse.ppt
169 pages
Data Warehouse Concepts: by Mr. Umar Frauq & Mr. C. Divakar
No ratings yet
Data Warehouse Concepts: by Mr. Umar Frauq & Mr. C. Divakar
58 pages
Datawarehousingbasics 160923045745
No ratings yet
Datawarehousingbasics 160923045745
21 pages
$RRWYO9T
No ratings yet
$RRWYO9T
71 pages
Data Warehouse OLAP OLTP
No ratings yet
Data Warehouse OLAP OLTP
12 pages
1 & 2 Data Warehousing_021052
No ratings yet
1 & 2 Data Warehousing_021052
80 pages
Bi Lecture 3 - 2023
No ratings yet
Bi Lecture 3 - 2023
45 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
14 pages
04DWH & Olap
No ratings yet
04DWH & Olap
50 pages
The Essential Guide to Database Management
From Everand
The Essential Guide to Database Management
Pasquale De Marco
No ratings yet
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
The Data Warehouse Advantage
From Everand
The Data Warehouse Advantage
Pasquale De Marco
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Slides1 - The Science of Macro
No ratings yet
Slides1 - The Science of Macro
17 pages
Chapter3 Application Layer
No ratings yet
Chapter3 Application Layer
104 pages
Networking Basics Chapter2
No ratings yet
Networking Basics Chapter2
40 pages
Python-Chapter 1
No ratings yet
Python-Chapter 1
63 pages
kenwood_tkr_750_service_manual_revised2
No ratings yet
kenwood_tkr_750_service_manual_revised2
101 pages
Intro Microwave
No ratings yet
Intro Microwave
118 pages
Mouse M3
No ratings yet
Mouse M3
6 pages
Think Parallel Brochure - Hybridmode
No ratings yet
Think Parallel Brochure - Hybridmode
1 page
RWF55 Temperature Pressure Burner Controller User Manual U7867en
No ratings yet
RWF55 Temperature Pressure Burner Controller User Manual U7867en
94 pages
Chapter 4 Question & Answers
No ratings yet
Chapter 4 Question & Answers
5 pages
Digital Literacy Skills
No ratings yet
Digital Literacy Skills
62 pages
STM8S-Duty PWM 1
No ratings yet
STM8S-Duty PWM 1
5 pages
AI Assignment
No ratings yet
AI Assignment
11 pages
Lec 2
No ratings yet
Lec 2
19 pages
Unmove For Civil Category
No ratings yet
Unmove For Civil Category
3 pages
Fatality Analysis Reporting System Fatal Crash Data Overview (Brochure)
No ratings yet
Fatality Analysis Reporting System Fatal Crash Data Overview (Brochure)
2 pages
RENEWABLEFORUM2023 Tentative Program
No ratings yet
RENEWABLEFORUM2023 Tentative Program
6 pages
87766-156407-2-PB - Rectifier Series or Paraller 12pulse Connection
No ratings yet
87766-156407-2-PB - Rectifier Series or Paraller 12pulse Connection
4 pages
HAZMAT Material
No ratings yet
HAZMAT Material
6 pages
Kiran seminar report
No ratings yet
Kiran seminar report
74 pages
Dell U2515h Monitor User's Guide en Us
No ratings yet
Dell U2515h Monitor User's Guide en Us
64 pages
Arduino Ir Sensor
No ratings yet
Arduino Ir Sensor
4 pages
Jung1993-Design Method of Test Road Profile ForVehicle Accelerated Durability Test
No ratings yet
Jung1993-Design Method of Test Road Profile ForVehicle Accelerated Durability Test
13 pages
1998-05 The Computer Paper - Ontario Edition PDF
No ratings yet
1998-05 The Computer Paper - Ontario Edition PDF
152 pages
RM24911 RM84911
No ratings yet
RM24911 RM84911
58 pages
Logic Pro X 10.4: Exam Preparation Guide
No ratings yet
Logic Pro X 10.4: Exam Preparation Guide
19 pages
Synove Brochure
No ratings yet
Synove Brochure
14 pages
Microsoft Azure Enterprise DevOps Report 2020-2021
100% (1)
Microsoft Azure Enterprise DevOps Report 2020-2021
53 pages
2g Cna Export Ericsson
No ratings yet
2g Cna Export Ericsson
73 pages
Work at Full Speed: Dell Latitude
No ratings yet
Work at Full Speed: Dell Latitude
6 pages
Smart Irrigation System
No ratings yet
Smart Irrigation System
12 pages
Products Guide Book: - Crawler Cranes
No ratings yet
Products Guide Book: - Crawler Cranes
16 pages

Data Warehousing & Data Mining Chapter 2

Uploaded by

Data Warehousing & Data Mining Chapter 2

Uploaded by

Chapter 2:

Olfa Dridi & Afef Ben Brahim 1

▪ A DW allows its users to extract required data, for

Data warehouse Operational system

▪ There are two types of data that outputs

can be used in Data Mining:

summary of data to protect

Savings Loans Trust Credit card

Same data Different data Data found here Different keys

DSA: A Data Staging Area is a temporary location where data from

▪ Takes place at idle times of the source system (typically

Olfa Dridi & Afef Ben Brahim 43

branch location_key location to_location

Industry Region Year

Category Country Quarter

Product City Month Week

▪ Describes a technology that is designed to

▪ Initially the concept hierarchy was "day < month <

▪ On drilling down, the time dimension is descended

▪ When drill-down is performed, one or more

▪ It navigates the data from less detailed data to highly

▪ Here, Slice is performed for the dimension "time" using the

Nest (pièce, région)

Information Data Warehouse OLAP Servers Clients

You might also like