SlideShare a Scribd company logo
Data Warehousing
Prepared by:Mohamemd Sayeeduddin
What is data warehouse
 A data warehouse (DW) is a repository that stores relational data that is organized,
cleansed, and standardized for enterprise use. A data warehouse is organized by subject-
oriented databases and is non-volatile in direct support of decision support system (DSS)
functionality. In so doing, a data warehouse includes strategically selected data that is
important to an enterprise for historical tracking, reporting, and analysis.
 A data warehouse has the following characteristics:
 Subject-oriented: data is theme- or object-based (i.e., customer, product, sales, etc.)
 Integrated: disparate data is combined and normalized from source systems
 Time-variant: data is organized by various time intervals for historical reporting and
preservation (i.e., week, month, quarter, year)
 Non-volatile: data is never changed or deleted; data is read-only and refreshed at well-
defined time intervals
 Summarized: data is often aggregated for optimization of reporting
Data Warehousing Architectures
Data warehouses can be architected using varying approaches. There are two
primary approaches: the dimensional approach (popularized by Ralph Kimball)
and the normalized approach (popularized by Bill Inmon).
 Normalized Approach
Inmon, on the other hand, utilized a “top-down” approach to normalize a data
warehouse. The normalized enterprise data model creates a central repository or
enterprise data warehouse. Dimensional data marts for specific departments or
organizational units can be created from the master enterprise data warehouse.
 Dimensional Approach
Kimball’s approach depicts a data warehouse via a dimensional model (star
schema or snowflake). The dimensional approach uses a “bottom-up” design, in
which individual data marts are created at the departmental or organizational
level (i.e., sales, human resources, finance, etc.) and built up to an enterprise
data warehouse (EDW). Today, Kimball’s approach is more popular because
business users can quickly gain usefulness from it.
Extract, Transform, Load (ETL)
 Extract, transform, load (ETL) is the process of data integration from source
operational or transactional systems to combine disparate data to a single
format in a central repository. Source data is extracted from transactional
systems; transformed for normalization, formatting, and error correction; and
loaded to the data warehouse for analytics and reporting
Data Warehousing fundamental for data engineering
Data Mart
A data mart is a subset of an enterprise data warehouse and is often referred to
as a “departmental data warehouse.” A data mart contains the same type of
information that exists in an enterprise data warehouse, but the data is
organized and optimized for a specific department or organizational unit. The
diagram in Figure 2 provides a high-level architecture of data warehousing and
shows how data marts fit into this architecture.
Enterprise Data warehouse
Operational Data Stores
 An operational data store (ODS) utilizes snapshots of operational or
transactional systems’ data to provide operational business reporting. ODS
differs from a data warehouse because the data is accessed directly from the
transactional system databases, and the operational data store is able to
write data back to the source systems. A primary purpose of an operational
data store is to deal with the complexities of maintaining up-to-date data in
the data warehouse. Thus, the ODS can be seen as a less expensive approach
to real-time data reporting.
Data Warehousing in the Cloud
 Data warehouses traditionally exist inside an organization’s local
infrastructure (on-premises), where the responsibility for configuration and
maintenance lies solely on information technology (IT) staff at the
organization. Data warehousing in the cloud shifts much of the responsibility
for hardware, networking, security, and maintenance to a third party, which
allows the organization to focus more on business goals and objectives. This
approach also allows users (who are often remote or mobile) a higher, more
consistent level of data warehouse availability.
Star Schema
 A star schema is a model that depicts data in a shape similar to that of a star.
A fact table exists in the center of the star and contains primary and foreign
keys to associated dimension tables, as well as aggregated data from the
operational or transactional systems. The dimension tables describe the data
and are included based on business needs. A star schema is not normalized
and provides simple modeling without the need for complex joins.
Star Schema
Snowflake Schema
 The snowflake schema design contains the same data that would exist in a
star schema, and the fact table and dimension tables look the same. The
main difference between the two is that the snowflake schema is normalized.
The process of normalizing the design is referred to as snowflaking. The
snowflake schema also requires less work to add more data to existing
dimensions and requires less storage due to the lack of redundancy in the
normalization process. Figure 2 displays an example of a snowflake schema.
Data Warehousing fundamental for data engineering
Quiz
 Question 1
 A ____________ is a repository that stores relational data that is organized,
cleansed, and standardized for enterprise use.
a) Database
b) Data Warehouse
c) Database Management System
Answer
 Data ware house:
Correct! A data warehouse is organized by subject-oriented databases and is non-
volatile in direct support of Decision Support System (DSS) functionality.
Quiz 2
 Which data warehousing architecture approach utilizes a bottom-up design?
a) Dimensional
b) Denormalized
c) Normalized
Answer
 Dimensional
 Correct
 Correct! Kimball’s approach uses a “bottom-up” design, in which individual
data marts are created at the department or organizational unit level (i.e.,
Sales, Human Resources, Finance, etc.) and built up to an enterprise data
warehouse.

More Related Content

Similar to Data Warehousing fundamental for data engineering (20)

PDF
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
cscpconf
 
PPTX
Data warehouse logical design
Er. Nawaraj Bhandari
 
DOCX
UNIT-5 DATA WAREHOUSING.docx
DURGADEVIL
 
PPTX
1-Data Warehousing-Multi Dim Data Model.pptx
ShobySunny2
 
PDF
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Datacademy.ai
 
PPTX
Introduction to Data Warehouse Modelling
riyasil2
 
PPTX
Data warehousing
Shruti Dalela
 
PPTX
Data warehousing Concepts and Design.pptx
Dr.S.Kiruba Devi
 
PPTX
introduction & conceptsdatawarehousing.pptx
BanuPriya900461
 
PDF
Introduction to Data Warehouse
SOMASUNDARAM T
 
PDF
Data warehousing
Juhi Mahajan
 
PDF
BI Chapter 03.pdf business business business business business business
JawaherAlbaddawi
 
PPTX
Introduction-to-Databases.pptx
IvanDarrylLopez
 
PDF
Unit III Introduction to DWH.pdf
ShivarkarSandip
 
PPT
Data wirehouse
Niyitegekabilly
 
PPTX
Data warehouse
RajThakuri
 
PPT
Presentation
Anoush Ghamsari
 
PPTX
MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact o...
ShivaniTiwari24572
 
DOCX
Data warehousing
1810dubeybhavna
 
PPT
Introduction to Data Warehouse
Shanthi Mukkavilli
 
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
cscpconf
 
Data warehouse logical design
Er. Nawaraj Bhandari
 
UNIT-5 DATA WAREHOUSING.docx
DURGADEVIL
 
1-Data Warehousing-Multi Dim Data Model.pptx
ShobySunny2
 
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Datacademy.ai
 
Introduction to Data Warehouse Modelling
riyasil2
 
Data warehousing
Shruti Dalela
 
Data warehousing Concepts and Design.pptx
Dr.S.Kiruba Devi
 
introduction & conceptsdatawarehousing.pptx
BanuPriya900461
 
Introduction to Data Warehouse
SOMASUNDARAM T
 
Data warehousing
Juhi Mahajan
 
BI Chapter 03.pdf business business business business business business
JawaherAlbaddawi
 
Introduction-to-Databases.pptx
IvanDarrylLopez
 
Unit III Introduction to DWH.pdf
ShivarkarSandip
 
Data wirehouse
Niyitegekabilly
 
Data warehouse
RajThakuri
 
Presentation
Anoush Ghamsari
 
MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact o...
ShivaniTiwari24572
 
Data warehousing
1810dubeybhavna
 
Introduction to Data Warehouse
Shanthi Mukkavilli
 

Recently uploaded (20)

PPTX
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
 
DOCX
Online Delivery Restaurant idea and analyst the data
sejalsengar2323
 
PDF
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
PPTX
GEN CHEM ACCURACY AND PRECISION eme.pptx
yeagere932
 
PDF
Before tackling these green level readers child Will need to be able to
startshws
 
PDF
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays
 
PPTX
Enterprise Architecture and TOGAF Presn
starksolutionsindia
 
PDF
apidays Munich 2025 - Geospatial Artificial Intelligence (GeoAI) with OGC API...
apidays
 
PDF
T2_01 Apuntes La Materia.pdfxxxxxxxxxxxxxxxxxxxxxxxxxxxxxskksk
mathiasdasilvabarcia
 
PPTX
nadsfbajkbfdbhbahfbadjhfbdsbdfsbdfdhbjsdhbfjjf
TauqeerUddin
 
PPTX
Slide studies GC- CRC - PC - HNC baru.pptx
LLen8
 
PPTX
SRIJAN_Projecttttt_Report_Cover_PPT.pptx
SakshiLodhi9
 
DOCX
Discover the Key Benefits of Implementing Data Mesh Architecture.docx
ajaykumar405166
 
PDF
Responsibilities of a Certified Data Engineer | IABAC
Seenivasan
 
PPTX
fashion industry boom.pptx an economics project
TGMPandeyji
 
PDF
apidays Munich 2025 - The life-changing magic of great API docs, Jens Fischer...
apidays
 
PPTX
isaacnewton-250718125311-e7ewqeqweqwa74d99.pptx
MahmoudHalim13
 
PPTX
Learning Tendency Analysis of Scratch Programming Course(Entry Class) for Upp...
ryouta039
 
PPTX
Data Analysis for Business - make informed decisions, optimize performance, a...
Slidescope
 
PDF
apidays Munich 2025 - Automating Operations Without Reinventing the Wheel, Ma...
apidays
 
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
 
Online Delivery Restaurant idea and analyst the data
sejalsengar2323
 
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
GEN CHEM ACCURACY AND PRECISION eme.pptx
yeagere932
 
Before tackling these green level readers child Will need to be able to
startshws
 
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays
 
Enterprise Architecture and TOGAF Presn
starksolutionsindia
 
apidays Munich 2025 - Geospatial Artificial Intelligence (GeoAI) with OGC API...
apidays
 
T2_01 Apuntes La Materia.pdfxxxxxxxxxxxxxxxxxxxxxxxxxxxxxskksk
mathiasdasilvabarcia
 
nadsfbajkbfdbhbahfbadjhfbdsbdfsbdfdhbjsdhbfjjf
TauqeerUddin
 
Slide studies GC- CRC - PC - HNC baru.pptx
LLen8
 
SRIJAN_Projecttttt_Report_Cover_PPT.pptx
SakshiLodhi9
 
Discover the Key Benefits of Implementing Data Mesh Architecture.docx
ajaykumar405166
 
Responsibilities of a Certified Data Engineer | IABAC
Seenivasan
 
fashion industry boom.pptx an economics project
TGMPandeyji
 
apidays Munich 2025 - The life-changing magic of great API docs, Jens Fischer...
apidays
 
isaacnewton-250718125311-e7ewqeqweqwa74d99.pptx
MahmoudHalim13
 
Learning Tendency Analysis of Scratch Programming Course(Entry Class) for Upp...
ryouta039
 
Data Analysis for Business - make informed decisions, optimize performance, a...
Slidescope
 
apidays Munich 2025 - Automating Operations Without Reinventing the Wheel, Ma...
apidays
 
Ad

Data Warehousing fundamental for data engineering

  • 2. What is data warehouse  A data warehouse (DW) is a repository that stores relational data that is organized, cleansed, and standardized for enterprise use. A data warehouse is organized by subject- oriented databases and is non-volatile in direct support of decision support system (DSS) functionality. In so doing, a data warehouse includes strategically selected data that is important to an enterprise for historical tracking, reporting, and analysis.  A data warehouse has the following characteristics:  Subject-oriented: data is theme- or object-based (i.e., customer, product, sales, etc.)  Integrated: disparate data is combined and normalized from source systems  Time-variant: data is organized by various time intervals for historical reporting and preservation (i.e., week, month, quarter, year)  Non-volatile: data is never changed or deleted; data is read-only and refreshed at well- defined time intervals  Summarized: data is often aggregated for optimization of reporting
  • 3. Data Warehousing Architectures Data warehouses can be architected using varying approaches. There are two primary approaches: the dimensional approach (popularized by Ralph Kimball) and the normalized approach (popularized by Bill Inmon).  Normalized Approach Inmon, on the other hand, utilized a “top-down” approach to normalize a data warehouse. The normalized enterprise data model creates a central repository or enterprise data warehouse. Dimensional data marts for specific departments or organizational units can be created from the master enterprise data warehouse.
  • 4.  Dimensional Approach Kimball’s approach depicts a data warehouse via a dimensional model (star schema or snowflake). The dimensional approach uses a “bottom-up” design, in which individual data marts are created at the departmental or organizational level (i.e., sales, human resources, finance, etc.) and built up to an enterprise data warehouse (EDW). Today, Kimball’s approach is more popular because business users can quickly gain usefulness from it.
  • 5. Extract, Transform, Load (ETL)  Extract, transform, load (ETL) is the process of data integration from source operational or transactional systems to combine disparate data to a single format in a central repository. Source data is extracted from transactional systems; transformed for normalization, formatting, and error correction; and loaded to the data warehouse for analytics and reporting
  • 7. Data Mart A data mart is a subset of an enterprise data warehouse and is often referred to as a “departmental data warehouse.” A data mart contains the same type of information that exists in an enterprise data warehouse, but the data is organized and optimized for a specific department or organizational unit. The diagram in Figure 2 provides a high-level architecture of data warehousing and shows how data marts fit into this architecture.
  • 9. Operational Data Stores  An operational data store (ODS) utilizes snapshots of operational or transactional systems’ data to provide operational business reporting. ODS differs from a data warehouse because the data is accessed directly from the transactional system databases, and the operational data store is able to write data back to the source systems. A primary purpose of an operational data store is to deal with the complexities of maintaining up-to-date data in the data warehouse. Thus, the ODS can be seen as a less expensive approach to real-time data reporting.
  • 10. Data Warehousing in the Cloud  Data warehouses traditionally exist inside an organization’s local infrastructure (on-premises), where the responsibility for configuration and maintenance lies solely on information technology (IT) staff at the organization. Data warehousing in the cloud shifts much of the responsibility for hardware, networking, security, and maintenance to a third party, which allows the organization to focus more on business goals and objectives. This approach also allows users (who are often remote or mobile) a higher, more consistent level of data warehouse availability.
  • 11. Star Schema  A star schema is a model that depicts data in a shape similar to that of a star. A fact table exists in the center of the star and contains primary and foreign keys to associated dimension tables, as well as aggregated data from the operational or transactional systems. The dimension tables describe the data and are included based on business needs. A star schema is not normalized and provides simple modeling without the need for complex joins.
  • 13. Snowflake Schema  The snowflake schema design contains the same data that would exist in a star schema, and the fact table and dimension tables look the same. The main difference between the two is that the snowflake schema is normalized. The process of normalizing the design is referred to as snowflaking. The snowflake schema also requires less work to add more data to existing dimensions and requires less storage due to the lack of redundancy in the normalization process. Figure 2 displays an example of a snowflake schema.
  • 15. Quiz  Question 1  A ____________ is a repository that stores relational data that is organized, cleansed, and standardized for enterprise use. a) Database b) Data Warehouse c) Database Management System
  • 16. Answer  Data ware house: Correct! A data warehouse is organized by subject-oriented databases and is non- volatile in direct support of Decision Support System (DSS) functionality.
  • 17. Quiz 2  Which data warehousing architecture approach utilizes a bottom-up design? a) Dimensional b) Denormalized c) Normalized
  • 18. Answer  Dimensional  Correct  Correct! Kimball’s approach uses a “bottom-up” design, in which individual data marts are created at the department or organizational unit level (i.e., Sales, Human Resources, Finance, etc.) and built up to an enterprise data warehouse.