0% found this document useful (0 votes)

3 views

Lesson 2. Data Warehouse Basic Concepts

A data warehouse (DW) is a centralized repository for historical and current data that supports decision-making across an organization. It is characterized as subject-oriented, integrated, time-variant, and non-volatile, allowing for efficient data analysis and reporting. The document also contrasts operational databases (OLTP) with data warehouses (OLAP), outlines the architecture and models of data warehouses, and describes the ETL process for data integration.

Uploaded by

Aaron Gutierrez

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Lesson 2. Data Warehouse Basic Concepts

Uploaded by

Aaron Gutierrez

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

UNIT

0 DATA WAREHOUSE
BASIC CONCEPTS

5 Data warehouse (DW or DWH) is a strategic

collection that provides all types of data support
for the decision-making process at all levels of
the enterprise .

DW is a single data store, created for analytical

reporting and decision support purposes. It also
provide guidance for business process
improvement, monitoring time, cost, quality and
control for companies that need business
intelligence. Usually a data warehouse contains
a large amount of historical data and uses
specific analysis.
LESSON 1:
DATA WAREHOUSE
AND ITS CHARACTERISTICS

OBJECTIVES:

At the end of this lesson, the student will be able to:

 Define the concept of a data warehouse

 Determine each characteristic of a data warehouse

 Identify the benefits of having a data warehouse

Duration: 1 hour

What is a Data Warehouse?

A data warehouse is basically a collection of current and historical data of
potential interest in the decision-making process throughout the company, which are
difficult or impossible to obtain in traditional operational databases. The data
originate from different operational transaction systems such as systems for sales,
customer accounts, and manufacturing, and may include data from Web site
transactions. The data warehouse provides to the user the availability of data to be
accessed as needed; hence, this data cannot be altered. When constructing a data
warehouse, it must go through processes such as data cleaning, data extraction and
conversion, data integration, and data loading. To guarantee the correctness of the
data, it has to be cleaned, extracted, converted into the required form of the data
warehouse, and loaded into the data warehouse. (Laudon & Laudon, 2014).

A data warehouse is maintained separately from an organization’s operational

databases. Data warehouse systems allow the integration of a variety of application
systems. They support information processing by providing a stable platform of
consolidated historical data for analysis (Han, Kamber & Pei, 2012).

Characteristics of a data warehouse

William H. Inmon, the recognized father of the data warehousing concept,
defines data warehouse systems as a collection of data that provides the following
characteristics:
a. subject-oriented
b. integrated
c. time-variant
d. non-volatile

A. Subject-oriented
Data warehouses are designed for decision-makers to analyze data. A data
warehouse environment is organized around significant subjects such as
customers, employees, suppliers, accounts, sales, products, and so on
instead of focusing on the day-to-day operations and transactions of an
organization. This subject-specific design helps in reducing the query
response time by searching a few records to get an answer to the user’s
question. For example, to learn more about the company's sales data, a data
warehouse that concentrates on sales can be built. Using this data
warehouse, questions like "Who was our best customer for this product last
year?" or "Who is possible to be our best customer next month?" will be
answered. This ability to define a data warehouse by subject matter, sales, in
this case, makes the data warehouse subject-oriented.

B. Integrated
Data warehouses can establish consistency between different data types from
other sources such as relational databases, flat files, and online transaction
records and putting it into one consistent location. The techniques for cleaning
and integration of data are applied to ensure consistency in naming
conventions, encoding structures, attribute measures, and so on.
C. Time-variant
Time links with data in the data warehouse. Data warehouse analysis focuses
on reflecting historical changes. The system records the information of the
company from a certain point of time.

D. Nonvolatile
Data in a data warehouse will remain stable and will not change. The data
operations involved are mainly for data queries and for analyzing the data
without updating in the general sense. Once a specific data enters the data
warehouse, it will generally retain for a long time. There are generally a large
number of query operations in the data warehouse. Usually, the work that a
data warehouse needs to do is to load, query, and analyze. Generally, it does
not perform any modification operations.

A well-designed data warehouse supports high-speed queries and high data

throughput. Based on this information, we can define data warehousing as the
process of constructing and using data warehouses. Building a data warehouse
requires cleaning, integration, and consolidation of data. The utilization of a data
warehouse often involves a collection of different decision support technologies. With
the aid of technologies, the “knowledge workers” (e.g., managers, analysts, and
executives) allows using the warehouse to obtain an overview of the data for
management analysis and business decision-making. The data warehouse can help
transform the company's operational data into high-value, available information (or
knowledge), which are to be delivered to the right people in the right way at the right
time (Han, Kamber & Pei, 2012).

Benefits of Data Warehouses

Data warehouses are beneficial to organizations for several reasons
(Bourgeois, 2014):

 Since data warehouse comes from historical data, the organization may
have a better understanding of the data that it is currently collecting and
what data needs to collect.
 To give a centralized view of all data being collected across the
organization and provides a means for determining the inconsistent data.
 After identifying that the identified data is consistent, an organization can
generate data without ambiguity.
 By having a data warehouse, the organization can have snapshots of data
over time.
 A data warehouse provides tools to combine data, which can provide new
information and analysis.
Where to use this data?
Many organizations use the information taken from a data warehouse to
support business decision-making activities, including (Han, Kamber & Pei, 2012):
1. increasing customer focus (i.e., analysis of customer buying patterns)
a. buying preference,
b. buying time,
c. budget cycles, and
d. appetites for spending
2. moving and managing the portfolios of each product by comparing its
performance of sales by quarter, by year, and by geographic regions to
modify the production strategies;
3. analyzing operations of the organization and looking for another source of
profit
4. managing customer relationships, making environmental corrections, and
managing the cost of corporate assets
LESSON 2:
OPERATIONAL DATABASE SYSTEM
VS. DATA WAREHOUSES

OBJECTIVES:
At the end of this lesson, the student will be able to:

 Describe the operational database system

 Compare OLTP and OLAP

 Identify the goals of OLTP and OLAP in different fields

Duration: 1 hour
Operational Database Systems vs. Data Warehouses

The operational database system is the primary source of the data

warehouse. It contains detailed information used to run the daily operations of the
organization, such as purchasing, inventory, manufacturing, banking, payroll,
registration, and accounting. As the update progresses, the data will often change
and reflect the current value of recent transactions. The operational database system
is also called the Online Transaction Processing (OLTP) system, which is used to
manage dynamic data in real-time. Operational data are those data included in the
operation of a specific system.

The data warehouse system serves users or knowledge workers for data
analysis and decision-making. This system can organize and present information in
a specific format to meet the diverse needs of various users. These systems are
called Online Analytical Processing (OLAP) systems. OLAP handles historical
data or archive data that are obtained over a long period. For example, if we collect
information about flight bookings for the last ten years, these data can provide us
with a lot of meaningful data, such as booking trends. This may provide useful
information, such as peak travel times, what kind of people are traveling in different
categories (economic/business), etc.

The significant difference between OLTP and OLAP systems is the amount of
data analyzed in a single transaction. OLTP manages many concurrent clients and
queries at the same time, and these queries and queries only involve a single record
or a limited set of files at a time. The OLAP system must have the ability to process
millions of files to answer a single query.

The goals of these two databases are different in the following fields (Han, Kamber &
Pei, 2012).
1. Users and system orientation
 OLTP system is customer-oriented and is designed for real-time
business transactions and processes.
 OLAP system is market-oriented and aims to analyze business
indicators for data analysis by the knowledge workers.

2. Data contents:
 OLTP system manages a set of simple transactions (CRUD), and are
too detailed to be easily used by an analyst.
 OLAP system manages high, complex, and unpredictable amounts of
historical data that provide convenience for summarization and
aggregation, which make the data more comfortable to use for
informed decision making.

3. Database design:
 OLTP system generally adopts an entity-relationship (ER) data model
and application-oriented database design.
 OLAP systems usually use a star or snowflake models and subject-
oriented database design.
4. View:
 OLTP system mainly focuses on the current data in the enterprise or
department.
 OLAP systems usually span multiple versions of the database schema.
OLAP systems also process data from various organizations and
integrate information from many data stores.

5. Access patterns:
 OLTP system is mainly composed of short atomic transactions.
 OLAP systems are read-only because these data warehouses store
historical data.

Other features that distinguish between OLTP and OLAP systems include database
size, frequency of operations, and performance metrics. These are summarized in
Table 7.1.

Table 7.1. Differences between OLTP and OLAP (Han, Kamber, & Pei, 2012)

Feature OLTP OLAP

Characteristic operational processing informational processing
Orientation transaction analysis
clerk, DBA, database knowledge worker (e.g., manager,
User
professional executive, analyst)
long-term informational
Function day-to-day operations
requirements decision support

DB design ER-based, application-oriented star/snowflake, subject-oriented

historic, accuracy maintained over

Data current, guaranteed up-to-date
time
Summarization primitive, highly detailed summarized, consolidated
View detailed, flat relational summarized, multi-dimensional
Unit of work short, simple transaction complex query
Access read/write mostly read
Focus data in information out
Operations index/hash on a primary key lots of scans
Number of records
tens millions
accessed
Number of users thousands hundreds
DB size GB to high-order GB ≥ TB
high performance, high
Priority high flexibility, end-user autonomy
availability
Metric transaction throughput query throughput, response time
LESSON 3:
DATA WAREHOUSE ARCHITECTURE
AND MODELS

OBJECTIVES:

At the end of this lesson, the student will be able to:

 Identify the three models of a data warehouse

 Discuss the requirements of each data warehouse model

 Differentiate the different types of data mart

Duration: 1.5 hours

Data Warehouse Architecture

Figure 7.1. A three-tier data warehousing architecture

(Han, Kamber, & Pei, 2012)

Data warehouses often adopt a three-tier architecture, as depicted in Figure

7.1.
Bottom Tier - It is where the data warehouse database server resides. Typically it
is a relational database system. Different back end tools and utilities are used to feed
data into the bottom tier. These back end tools perform the extract, clean, load, and
refresh functions.

Middle Tier - The middle tier in a data warehouse is typically an OLAP server which
is implemented in either of the following models:
1. ROLAP (Relational OLAP) – an extended relational database
management system that maps operations on multi-dimensional data to
standard relational operations.
2. MOLAP (Multi-dimensional OLAP) – this model directly implements the
multi-dimensional data and operations.
Top Tier - − This tier is the front-end client layer. This layer holds the query and
reporting tools, analysis tools, and data mining tools.

Data Warehouse Models

From the perspective of data warehouse architecture, there are three data
warehouse models:
 enterprise warehouse
 data mart
 virtual warehouse

Enterprise Warehouse
An enterprise warehouse stores and manages all historical records about
subjects (customers, products, sales, assets, personnel) across the entire
organization. It supports corporate-wide data integration, usually from one or more
operational systems or external data providers, and it is cross-functional in scope.
Enterprise warehouse contains detailed data as well as summarized data and can
range in size from a few gigabytes to hundreds of gigabytes, terabytes, or beyond.
An enterprise data warehouse may be implemented on traditional mainframes,
supercomputer servers, or parallel architecture platforms. It requires extensive
business modeling and may take years to design and build.

Data Mart
A data mart is a smaller, more centralized data warehouse. It contains a
subset of corporate-wide data that is of value to a specific group of users. The scope
of a data mart is restricted to particular selected subjects. Simply put, raw data runs
from the data warehouse into different departments to support the customized use of
these departments. These department-level databases are called data marts. A data
mart is a data collection of a department. For example, a marketing data mart may
restrict its subjects for a customer, item, and sales; therefore, the marketing
department has its data mart. The finance department also has its data mart. The
data mart of both departments may be related, but they are different and
independent. In a separate data mart, data can be collected directly from data
sources.
Data Mart, unlike enterprise warehouse, is usually implemented on low-cost
departmental servers that are Unix/Linux or Windows-based. The implementation
sequence of a data mart is measured in weeks rather than months or years, unlike
the enterprise warehouse. However, this may involve complex integration in the long
run if its design and planning were not enterprise-wide.
Data marts can be categorized depending on the source of data:
1. Independent data marts. In independent data marts, data come from one
or more operating systems or external information providers, or from data
generated in a specific department or region.
2. Dependent data marts. The data in the dependent data mart comes
directly from the enterprise data warehouses.

Virtual warehouse
A virtual warehouse is a set of views over operational databases that can be queried
together so a user can effectively access all the data as if it was stored in one data
warehouse. For efficient query processing, only some of the possible summary views
may be materialized. A virtual warehouse is easy to build but requires excess
capacity on operational database servers.
LESSON 4:
EXTRACTION, TRANSFORMATION AND
LOADING

OBJECTIVES:

At the end of this lesson, the student will be able to:

 Define the ETL process

 Determine the data to be cleansed

 Enumerate the steps in loading the data to data warehouse

Duration: 1 hour
Extraction, Transformation, and Loading (ETL)

Figure 7.1 shows that data warehouse systems use back-end tools and
utilities to populate and refresh the data. These tools and utilities include the
Extraction, Transformation, and Loading (ETL) process: ETL is the process of
extracting, cleaning, and transforming business system data and then loading it into
the data warehouse. The purpose of ETL is to integrate scattered, messy, and
inconsistent data in the organization. It provides an analytical basis for enterprise
decision-making. ETL is a systematic method of data warehouse systems; this can
be done either daily, weekly, or monthly, and needs to be flexible, automated, and
well-documented (Golfarelli & Rizzi, 2009).

ETL Tools
1. Data Extraction

Extraction is the first step of ETL, which aims to extract information from the
target source system. The extraction process is usually one of the most time-
consuming tasks in ETL. Different systems tend to use other data formats,
which are standardized into a standard format for further processing. The
source system may be complex and inadequately documented, making it
difficult to determine what data needs to be extracted. The data must be
fetched several times regularly to provide all the changed data to the
warehouse and keep it up to date.

2. Data cleaning

Cleaning (cleansing or scrubbing) aims mainly to improve data quality. Data

quality rules set by data extraction are allowed to remove the erroneous
records first and then adjust the corresponding cleaning operation according
to the actual situation when possible. Below is the list of data that needs to be
cleansed:

 Duplicate data. For example, a student is recorded many times in a

university database system
 Inconsistent values that are logically associated. Such as addresses
and ZIP codes
 Missing data. Such as a student’s last name
 Unexpected use of fields. For example, a contactNumber field could be
misused to store student number
 Impossible or wrong values. Such as 22/30/2020
 Inconsistent values for a single entity due to different practices were
used. For example, to specify a country, you can use an international
country abbreviation (PH) or a full country name (Philippines); similar
problems arise with addresses (Marcos St or Marcos Street)
 Inconsistent values for an individual entity because of typing
mistakes. Such as Broklyn Shop instead of Brooklyn Shop.
3. Data transformation
Data transformation converts data from its operational source format into a
specific data warehouse format.
The following are the major transformation involved:
 Conversion and normalization
 Matching the equivalent fields in different sources
 Reducing the number of source fields and records through selection,
which converts data from legacy or host format to warehouse format.
4. Loading
It sorts, summarizes, consolidates, compute views, checks integrity, and
builds indices and partitions of data. Loading carried out in two ways:
 Refresh. Data warehouse data is completely rewritten, which means the
older data is replaced. Refresh is generally used in combination with static
extraction to primarily filling a data warehouse.
 Update. The changes applied to source data are now added to the data
warehouse. The update is carried out without deleting or modifying
preexisting data. This technique is used in combination with incremental
extraction
LESSON 5:
METADATA REPOSITORY

OBJECTIVES:

At the end of this lesson, the student will be able to:

 Describe the usage of metadata in data warehouse

 Identify each category of metadata

 Discuss the importance of metadata in data warehouse

Duration: 1 hour
Metadata
Metadata refers to "data about data". In a data warehouse, it is to define and
describe all the information on the warehouse subject. Metadata runs through the
entire life cycle of the data warehouse and uses metadata to drive the development
of the data warehouse to automate and visualize the data warehouse.

Types of Metadata
Metadata in a data warehouse fall into three major categories (Ponniah, 2010):
 Operational metadata
 Extraction and transformation metadata
 End-user metadata

Operational Metadata. It contains information about operational data sources.

a. Structures of data that come from different operational systems
b. Field lengths and data types of data elements selected for the data
warehouse
c. Tasks involved in selecting data from the source systems for the data
warehouse(i.e., splitting records, combining parts of records from different
source files, and deal with multiple coding schemes and field lengths)
d. The output data is a tie back with the source data sets.

Extraction and Transformation Metadata. These metadata contain data about

the extraction of data from the source systems, namely:
a. extraction frequencies
b. extraction methods, and
c. business rules for data extraction.

End-User Metadata. It is the navigational map of the data warehouse. It enables

the end-users to get the information from the data warehouse. It allows the end-
users to use their business terminology and look for information in those ways in
which they usually think of the business.

Why is metadata specifically crucial in a data warehouse (Ponniah, 2010)?

1. Metadata acts as the glue that connects all parts of the data warehouse.
2. It provides information about the contents and structures to the
developers.
3. It opens the door to the end-users and makes the contents recognizable in
their terms.
Metadata repository
The metadata itself is in the metadata repository. A metadata repository is just
like a dictionary which contains different words with its synonyms or definitions.
Metadata repository management software can be used to map source data to target
databases, integrate and transform data, generate code for data transformation, and
move data to the warehouse (Han, Kamber, & Pei, 2012).

The metadata repository includes the following:

1. Data warehouse structure description
 schema, view, dimensions, hierarchies, and data definitions
 data mart locations and contents.
2. Operational metadata
 data lineage (history of migrated data and its transformation path),
 the currency of data (i.e., active, archived, or purged)
 monitoring information (data warehouse usage statistics, error reports,
and audit trails).
3. Algorithms used for summarization
 measure and dimension definition of algorithms
 data on granularity
 pre-determined queries and reports
4. Operational environment to the data warehouse mapping
 source databases and their contents
 gateway descriptions, partitions of data, data extraction, cleaning,
transformation rules and defaults
 data refresh and purging rules
 and security (user authorization and access control).
5. Data related to system performance
 indices and profiles that improve the access and retrieval performance
of data
 rules of timing and scheduling of refresh, update, and replication cycles
6. Business metadata,
 business terms and definitions,
 data ownership information,
 charging policies.

THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
In The Garden Action Verbs Matching Exercise
100% (1)
In The Garden Action Verbs Matching Exercise
7 pages
Unit 2
No ratings yet
Unit 2
31 pages
Unit 1
No ratings yet
Unit 1
99 pages
Data Mining UNIT I
No ratings yet
Data Mining UNIT I
11 pages
Assignment (1) Muhammad Uzair Class No (260) Sec (E) Reg No (39626) Distributed Database System
No ratings yet
Assignment (1) Muhammad Uzair Class No (260) Sec (E) Reg No (39626) Distributed Database System
7 pages
Data Mining Complete
No ratings yet
Data Mining Complete
95 pages
Warehousing
No ratings yet
Warehousing
15 pages
DWM UNIT-I NOTES
No ratings yet
DWM UNIT-I NOTES
9 pages
DWM Unit I
No ratings yet
DWM Unit I
114 pages
What Is Data Warehouse
No ratings yet
What Is Data Warehouse
19 pages
DATA Science Unit -II Part 1
No ratings yet
DATA Science Unit -II Part 1
20 pages
Data Mining Unit-2 notes
No ratings yet
Data Mining Unit-2 notes
8 pages
Datawarehouse unit2
No ratings yet
Datawarehouse unit2
75 pages
Term 1
No ratings yet
Term 1
12 pages
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
100% (1)
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
82 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
21 pages
DWBI Unit-1
No ratings yet
DWBI Unit-1
19 pages
Data Warehouse and Mining-1
No ratings yet
Data Warehouse and Mining-1
40 pages
Unit 1 DWDM
No ratings yet
Unit 1 DWDM
122 pages
Chapter-2 DATA WAREHOUSE PDF
100% (1)
Chapter-2 DATA WAREHOUSE PDF
28 pages
Data Warehousing
No ratings yet
Data Warehousing
77 pages
Week-2-Data Warehouse and Olap
No ratings yet
Week-2-Data Warehouse and Olap
57 pages
Data Warehouse
No ratings yet
Data Warehouse
4 pages
FDS Unit-2
No ratings yet
FDS Unit-2
36 pages
Data Warehousing Fundamentals
No ratings yet
Data Warehousing Fundamentals
108 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
DW Unit-1 (1) XXXXXXXX
No ratings yet
DW Unit-1 (1) XXXXXXXX
70 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
15 pages
Lecture 3
No ratings yet
Lecture 3
49 pages
DWM Lecture 1
No ratings yet
DWM Lecture 1
33 pages
Data War Eh Puse
No ratings yet
Data War Eh Puse
51 pages
Module1 Part3
No ratings yet
Module1 Part3
46 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Introduction To Data Warehousing Concepts
No ratings yet
Introduction To Data Warehousing Concepts
8 pages
Data Ware House
No ratings yet
Data Ware House
6 pages
Lecture # 1-2-Intro
No ratings yet
Lecture # 1-2-Intro
55 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
156 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
18 pages
Data Warehousing 07012013132829 Data Warehousing
No ratings yet
Data Warehousing 07012013132829 Data Warehousing
28 pages
DWH Fundamentals (Training Material)
No ratings yet
DWH Fundamentals (Training Material)
21 pages
Unit Ii DWDM
No ratings yet
Unit Ii DWDM
10 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
122 pages
Module 3
No ratings yet
Module 3
17 pages
DWDM Unit 1(Lecture 1) PPT
No ratings yet
DWDM Unit 1(Lecture 1) PPT
7 pages
DWDM Lecturenotes PDF
No ratings yet
DWDM Lecturenotes PDF
133 pages
Data Ware House Concepts
No ratings yet
Data Ware House Concepts
12 pages
2.data Warehousing: Heterogeneous Database Integration
No ratings yet
2.data Warehousing: Heterogeneous Database Integration
26 pages
$RRWYO9T
No ratings yet
$RRWYO9T
71 pages
DMDW1
No ratings yet
DMDW1
13 pages
Data warehouse unit-3 complete
No ratings yet
Data warehouse unit-3 complete
31 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
87 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
92 pages
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
100% (1)
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
77 pages
Data Mining UNIT 2 LECTURE NOTES
No ratings yet
Data Mining UNIT 2 LECTURE NOTES
32 pages
CH 1
No ratings yet
CH 1
53 pages
Data Warehouse Essentials: Mastering the Foundations of Data Management
From Everand
Data Warehouse Essentials: Mastering the Foundations of Data Management
Virversity Online Courses
No ratings yet
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
Pink White Cute Illustrated Cake Menu_20250418_172523_0000
No ratings yet
Pink White Cute Illustrated Cake Menu_20250418_172523_0000
1 page
Lesson 3. Data Warehouse Modelling
No ratings yet
Lesson 3. Data Warehouse Modelling
16 pages
Research_on_Working_Capital_Practices_PPT
No ratings yet
Research_on_Working_Capital_Practices_PPT
14 pages
Research_on_Working_Capital_Practices_PPT
No ratings yet
Research_on_Working_Capital_Practices_PPT
14 pages
TechIS_Gadgets_CaseStudy_PPT
No ratings yet
TechIS_Gadgets_CaseStudy_PPT
11 pages
IT - T3 - BCS054 Er. Aleesha Khan
No ratings yet
IT - T3 - BCS054 Er. Aleesha Khan
4 pages
How Does Francis Bacon's of Love' Alter Your Understanding of Romeo and Juliet? (2007)
100% (1)
How Does Francis Bacon's of Love' Alter Your Understanding of Romeo and Juliet? (2007)
3 pages
Relative Xpath
No ratings yet
Relative Xpath
11 pages
Permutations & Combinations
No ratings yet
Permutations & Combinations
137 pages
UCT PSY2015F Statistics 2023
No ratings yet
UCT PSY2015F Statistics 2023
34 pages
Programming Fundamentals 3
No ratings yet
Programming Fundamentals 3
10 pages
MS 24 -24
No ratings yet
MS 24 -24
11 pages
Youtube Channel Info
No ratings yet
Youtube Channel Info
15 pages
KK
No ratings yet
KK
5 pages
Daily Story Porcupine in A Pine Tree
No ratings yet
Daily Story Porcupine in A Pine Tree
4 pages
MCCRS Standards:: Literacy Lesson Plan Grade Subject
No ratings yet
MCCRS Standards:: Literacy Lesson Plan Grade Subject
5 pages
Linguistics and LGT
No ratings yet
Linguistics and LGT
3 pages
Landslide - Fleetwood Mac - Vocal Notation & Guitar Tablature PDF - Landslide
No ratings yet
Landslide - Fleetwood Mac - Vocal Notation & Guitar Tablature PDF - Landslide
1 page
Author Samuel Marino’s New Book, "One Minute to Judgment," is an Eye-Opening Exploration of the Impending Moral Destruction of America, and What Can be Done to Stop It
No ratings yet
Author Samuel Marino’s New Book, "One Minute to Judgment," is an Eye-Opening Exploration of the Impending Moral Destruction of America, and What Can be Done to Stop It
3 pages
UEH Exception Definition, Format and Mapping: Revision Date Project
No ratings yet
UEH Exception Definition, Format and Mapping: Revision Date Project
5 pages
1.1.4 ppt
No ratings yet
1.1.4 ppt
11 pages
G1-Q2-Dll-Week 4-Math
No ratings yet
G1-Q2-Dll-Week 4-Math
14 pages
WebCollectTechnical Guide Q4 2019
No ratings yet
WebCollectTechnical Guide Q4 2019
371 pages
A Classical Chinese Reader PDF
100% (3)
A Classical Chinese Reader PDF
88 pages
Slot 28 - 29-Background Tasks With Worker Service
No ratings yet
Slot 28 - 29-Background Tasks With Worker Service
37 pages
Thesis For Argumentative Essays
100% (3)
Thesis For Argumentative Essays
8 pages
Alun Munslow Narrative and History Oxford Macmillan Education Palgrave 2018
100% (1)
Alun Munslow Narrative and History Oxford Macmillan Education Palgrave 2018
203 pages
Cat Tools History-1
No ratings yet
Cat Tools History-1
8 pages
4341 11625 1 SM
No ratings yet
4341 11625 1 SM
7 pages
RNW - Lesson 1
No ratings yet
RNW - Lesson 1
25 pages
Signs of Lingering Unforgiveness
No ratings yet
Signs of Lingering Unforgiveness
4 pages
Cloud Computing Infrastructure As A Service (IaaS)
No ratings yet
Cloud Computing Infrastructure As A Service (IaaS)
5 pages
Mike Portnoy - Score
No ratings yet
Mike Portnoy - Score
11 pages

Lesson 2. Data Warehouse Basic Concepts

Uploaded by

Lesson 2. Data Warehouse Basic Concepts

Uploaded by

UNIT

5 Data warehouse (DW or DWH) is a strategic

DW is a single data store, created for analytical

At the end of this lesson, the student will be able to:

 Define the concept of a data warehouse

 Determine each characteristic of a data warehouse

 Identify the benefits of having a data warehouse

What is a Data Warehouse?

A data warehouse is maintained separately from an organization’s operational

Characteristics of a data warehouse

A well-designed data warehouse supports high-speed queries and high data

Benefits of Data Warehouses

 Describe the operational database system

 Compare OLTP and OLAP

 Identify the goals of OLTP and OLAP in different fields

The operational database system is the primary source of the data

Feature OLTP OLAP

DB design ER-based, application-oriented star/snowflake, subject-oriented

historic, accuracy maintained over

At the end of this lesson, the student will be able to:

 Identify the three models of a data warehouse

 Discuss the requirements of each data warehouse model

 Differentiate the different types of data mart

Duration: 1.5 hours

Figure 7.1. A three-tier data warehousing architecture

Data warehouses often adopt a three-tier architecture, as depicted in Figure

Data Warehouse Models

At the end of this lesson, the student will be able to:

 Define the ETL process

 Determine the data to be cleansed

 Enumerate the steps in loading the data to data warehouse

Cleaning (cleansing or scrubbing) aims mainly to improve data quality. Data

 Duplicate data. For example, a student is recorded many times in a

At the end of this lesson, the student will be able to:

 Describe the usage of metadata in data warehouse

 Identify each category of metadata

 Discuss the importance of metadata in data warehouse

Operational Metadata. It contains information about operational data sources.

Extraction and Transformation Metadata. These metadata contain data about

End-User Metadata. It is the navigational map of the data warehouse. It enables

Why is metadata specifically crucial in a data warehouse (Ponniah, 2010)?

The metadata repository includes the following:

You might also like