SlideShare a Scribd company logo
Integrating Hadoop into Data
Warehousing Architecture
Where is the Wisdom? Lost in the Knowledge.
Where is the Knowledge? Lost in the Information.
T.S. Eliot
© Humza Naseer, University of Melbourne 2014
Outline
Findings,
Conclusion &
Future Work
Current Work:
Hadoop Integration
into Data Warehouse
Environment
Related Work:
Trends in Data
Warehouse
Architecture
Link Between Hadoop
and Data Warehouse
Introduction
© Humza Naseer, University of Melbourne 2014 2
Identify all possible enterprise data assets
Select those assets that have actionable content and can be
accessed
Bring the data assets into a logically centralized “enterprise
data warehouse”
Expose those data assets most effectively for decision
making
(Kimball & Ross, 2013)
Intro: The Data Warehouse Mission
© Humza Naseer, University of Melbourne 2014 3
Hadoop is an Ecosystem of products
 Open source
 Vendor distributions
 Additional tools for development and administration
Hadoop Benefits
 Enables big data analytics
 Supports advanced forms of analytics
 Scales cost effectively
 Extends a data warehouse environment
Hadoop Limitations
• Low latency queries
• Ease of access
• Data integration and integrity
• Fine grained security
Intro: Overview of Hadoop
Unstructured
Data
Query Results
HDFS
Data Nodes
Map Reduce
© Humza Naseer, University of Melbourne 2014 4
A data warehouse system fetches and unifies data from
heterogeneous source systems into a centralized dimensional
or normalized data repository
(Rainardi, 2008)
Data warehouse is not a tool or technology
 It is a business process which unifies an enterprise through data
(Eckerson, 2012)
Hadoop a problem or an opportunity?
Where Hadoop fits into data warehouse architecture?
Link Between Hadoop and Data
Warehouse
© Humza Naseer, University of Melbourne 2014 5
Traditional RDBMSs cannot handle
 The new data types
 Extended analytic processing
 Terabytes/hour loading with immediate query access
We want to use SQL, but we don’t want the RDBMS storage
constraints
The disruptive solution: Hadoop (Kimball & Ross, 2013)
Why is Integration Happening?
DB1
DB2
DB3
Transformation
and Load
Central
DW
BI App-1
BI App-2
BI App-3
Decision
Making
© Humza Naseer, University of Melbourne 2014 6
Ponniah (2011) notes that selection of DW architecture is based on
enterprise requirements.
DW architecture has multiple architectural layers and components
 Logical architecture
 Physical architecture
(Moss and Atre, 2013)
DW architecture overlaps with data integration, business intelligence and
enterprise data
(Russom, 2014)
Inmon vs Kimball dichotomy
(Ariyachandra and Watson, 2010)
Trends in Data Warehouse
Architectures
© Humza Naseer, University of Melbourne 2014 7
Eckerson (2012) notes that reporting and analytics have different
workload requirements
Reporting is based on the entities and facts which are well known
Advanced analytics empowers the discovery of new facts which are
not well known
Multi-platform unified data architecture
 Includes enterprise data warehouse (EDW) and several other new data
platforms which augment EDW
(Russom, 2013)
Hadoop Integration into data
warehousing environment
© Humza Naseer, University of Melbourne 2014 8
Data Staging
Data archiving
Advanced analytics
Multi-structured data
Uses of Hadoop that Extend DW
Architectures
DB1
DB2
DB3
Transformation
and Load
EDW
BI App-1
BI App-2
BI App-3
Decision
Making
© Humza Naseer, University of Melbourne 2014 9
Analytics and reporting have different requirements for DW
architectures
Characterize the DW architecture by counting the number and
types of workloads it supports
Logical DW architecture must integrate multiple physical
platforms
Design of logical DW architecture must be compartmentalized
Proposed logical architecture for new DW ecosystem
(An Extension of Eckerson (2012) BI architecture)
Findings
© Humza Naseer, University of Melbourne 2014 10
Enterprise Data
WarehouseOperational
System
Operational
System
Operational
Data Store
Subject Area
Data Marts
BI
Server
Online Transaction Processing Systems
(Relational Data) Event driven alerting
environment
Reporting/analysis
Environment
Logical Architecture of New DW
Ecosystem
DW-Centric Sandbox
Web Data
Machine Data
Log files
Legacy/External
Data
Replicated
Sandbox
In-memory
BI Sandbox
Hadoop Ecosystem
Cluster
(Non-relational Data)
Exploration/discovery
environment
Non-relational
Extract, transform and Load
(Batch, real time or near real
time)
Power User
Casual User
QueryETLStreaming
Top down architecture
Bottom up architecture
© Humza Naseer, University of Melbourne 2014 11
BI Assessment Model
Data Warehouse
Ecosystem
Data Marts
Enterprise Data
Warehouse
Work Load Specific
Data Platforms
Workload Capacity
Degree of
Integration
High
High
Low
Low
Degree of
Standardization
High
Low
© Humza Naseer, University of Melbourne 2014 12
Hadoop enables new types of applications within DW
environment
Big data analytics, advanced analytics and discovery analytics
Information exploration and augmenting a data warehouse
Should be implemented in multi-platform DW environment
Future work:
 Conformed dimensions
 BI maturity roadmap
Conclusion
© Humza Naseer, University of Melbourne 2014 13
Questions
© Humza Naseer, University of Melbourne 2014 14

More Related Content

What's hot (20)

PDF
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Lviv Startup Club
 
PPTX
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Cloudera, Inc.
 
PPTX
Hybrid Data Warehouse Hadoop Implementations
David Portnoy
 
PDF
What is hadoop
Asis Mohanty
 
PDF
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
NoSQLmatters
 
PDF
Integrated Data Warehouse with Hadoop and Oracle Database
Gwen (Chen) Shapira
 
PDF
Big Data Architecture Workshop - Vahid Amiri
datastack
 
PPTX
Hadoop Powers Modern Enterprise Data Architectures
DataWorks Summit
 
PPTX
Scaling Data Science on Big Data
DataWorks Summit
 
PPTX
Hadoop and Hive in Enterprises
markgrover
 
PPTX
Data Warehouse Optimization
Cloudera, Inc.
 
PPTX
Big Data in Azure
DataWorks Summit/Hadoop Summit
 
PPTX
Microsoft Data Platform - What's included
James Serra
 
PPTX
Microsoft Azure Big Data Analytics
Mark Kromer
 
PPTX
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
MapR Technologies
 
PDF
Data lake
GHAZOUANI WAEL
 
PDF
Data Lake for the Cloud: Extending your Hadoop Implementation
Hortonworks
 
PDF
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
PDF
Building a Data Lake - An App Dev's Perspective
GeekNightHyderabad
 
PDF
5 Steps for Architecting a Data Lake
MetroStar
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Lviv Startup Club
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Cloudera, Inc.
 
Hybrid Data Warehouse Hadoop Implementations
David Portnoy
 
What is hadoop
Asis Mohanty
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
NoSQLmatters
 
Integrated Data Warehouse with Hadoop and Oracle Database
Gwen (Chen) Shapira
 
Big Data Architecture Workshop - Vahid Amiri
datastack
 
Hadoop Powers Modern Enterprise Data Architectures
DataWorks Summit
 
Scaling Data Science on Big Data
DataWorks Summit
 
Hadoop and Hive in Enterprises
markgrover
 
Data Warehouse Optimization
Cloudera, Inc.
 
Microsoft Data Platform - What's included
James Serra
 
Microsoft Azure Big Data Analytics
Mark Kromer
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
MapR Technologies
 
Data lake
GHAZOUANI WAEL
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Hortonworks
 
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
Building a Data Lake - An App Dev's Perspective
GeekNightHyderabad
 
5 Steps for Architecting a Data Lake
MetroStar
 

Viewers also liked (20)

PPTX
Hadoop and Your Data Warehouse
Caserta
 
KEY
Large scale ETL with Hadoop
OReillyStrata
 
PPTX
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Caserta
 
PPTX
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Caserta
 
PPTX
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Hyunsik Choi
 
PDF
Introduction to Apache Tajo: Data Warehouse for Big Data
Gruter
 
PPTX
Roadmap for solution company
Lytton He
 
PDF
Tajo: A Distributed Data Warehouse System for Hadoop
Hyunsik Choi
 
PDF
Informatica Command Line Statements
mnsk80
 
PDF
Dimensional modeling primer
Terry Bunio
 
PPT
Dimensional Modelling Session 2
akitda
 
PPT
Dimensional modelling-mod-3
Malik Alig
 
PPTX
Why PTC for SLM?
Tom Kenslea
 
PPTX
Cloud- A Technical or Organisational Challenge? Or Both?
Justin Pirie
 
PPTX
Dimensional Modeling
Bryan Cafferky
 
PPT
Kimball Vs Inmon
guest2308b5
 
PDF
Designing the Industrial Internet
Dane Petersen
 
PDF
Retaam_ThingWorx
Khaled Shaikh
 
PDF
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
Kai Wähner
 
PDF
Building a Big Data platform with the Hadoop ecosystem
Gregg Barrett
 
Hadoop and Your Data Warehouse
Caserta
 
Large scale ETL with Hadoop
OReillyStrata
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Caserta
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Caserta
 
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Hyunsik Choi
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Gruter
 
Roadmap for solution company
Lytton He
 
Tajo: A Distributed Data Warehouse System for Hadoop
Hyunsik Choi
 
Informatica Command Line Statements
mnsk80
 
Dimensional modeling primer
Terry Bunio
 
Dimensional Modelling Session 2
akitda
 
Dimensional modelling-mod-3
Malik Alig
 
Why PTC for SLM?
Tom Kenslea
 
Cloud- A Technical or Organisational Challenge? Or Both?
Justin Pirie
 
Dimensional Modeling
Bryan Cafferky
 
Kimball Vs Inmon
guest2308b5
 
Designing the Industrial Internet
Dane Petersen
 
Retaam_ThingWorx
Khaled Shaikh
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
Kai Wähner
 
Building a Big Data platform with the Hadoop ecosystem
Gregg Barrett
 
Ad

Similar to Hadoop Integration into Data Warehousing Architectures (20)

PPTX
Is the traditional data warehouse dead?
James Serra
 
PDF
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
PDF
Modern data warehouse
Stephen Alex
 
PDF
Modern data warehouse
Stephen Alex
 
PPTX
Data Warehousing about data ware house.pptx
AnsarHasas1
 
PDF
Hadoop as an extension of DW
Sidi yazid
 
PDF
A Revolutionary Approach to Modernizing the Data Warehouse
Inside Analysis
 
PDF
Foundation for Success: How Big Data Fits in an Information Architecture
Inside Analysis
 
PPTX
introduction to data warehousing
ssuser2e437f
 
PDF
Rando Veizi: Data warehouse and Pentaho suite
Carlo Vaccari
 
PPT
Data Warehouse Introduction to Data Warehouse
MSridhar18
 
PDF
Traditional data word
orcoxsm
 
PDF
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
Denodo
 
PPTX
Architecting a Modern Data Warehouse: Enterprise Must-Haves
Yellowbrick Data
 
PDF
Fundamentals Big Data and AI Architecture
Guido Schmutz
 
PPTX
Data Lake Overview
James Serra
 
PDF
The State of the Data Warehouse in 2017 and Beyond
SingleStore
 
PDF
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Edureka!
 
PPTX
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
shruthisweety4
 
PDF
Data Warehousing 2016
Kent Graziano
 
Is the traditional data warehouse dead?
James Serra
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
Modern data warehouse
Stephen Alex
 
Modern data warehouse
Stephen Alex
 
Data Warehousing about data ware house.pptx
AnsarHasas1
 
Hadoop as an extension of DW
Sidi yazid
 
A Revolutionary Approach to Modernizing the Data Warehouse
Inside Analysis
 
Foundation for Success: How Big Data Fits in an Information Architecture
Inside Analysis
 
introduction to data warehousing
ssuser2e437f
 
Rando Veizi: Data warehouse and Pentaho suite
Carlo Vaccari
 
Data Warehouse Introduction to Data Warehouse
MSridhar18
 
Traditional data word
orcoxsm
 
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
Denodo
 
Architecting a Modern Data Warehouse: Enterprise Must-Haves
Yellowbrick Data
 
Fundamentals Big Data and AI Architecture
Guido Schmutz
 
Data Lake Overview
James Serra
 
The State of the Data Warehouse in 2017 and Beyond
SingleStore
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Edureka!
 
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
shruthisweety4
 
Data Warehousing 2016
Kent Graziano
 
Ad

Recently uploaded (20)

PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PPTX
Human-Action-Recognition-Understanding-Behavior.pptx
nreddyjanga
 
PPTX
加拿大尼亚加拉学院毕业证书{Niagara在读证明信Niagara成绩单修改}复刻
Taqyea
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPTX
Hadoop_EcoSystem slide by CIDAC India.pptx
migbaruget
 
PPT
deep dive data management sharepoint apps.ppt
novaprofk
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PPTX
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
PPTX
Introduction to Artificial Intelligence.pptx
StarToon1
 
PPT
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
DOC
MATRIX_AMAN IRAWAN_20227479046.docbbbnnb
vanitafiani1
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PPTX
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
DOCX
AI/ML Applications in Financial domain projects
Rituparna De
 
PDF
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
Human-Action-Recognition-Understanding-Behavior.pptx
nreddyjanga
 
加拿大尼亚加拉学院毕业证书{Niagara在读证明信Niagara成绩单修改}复刻
Taqyea
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
Hadoop_EcoSystem slide by CIDAC India.pptx
migbaruget
 
deep dive data management sharepoint apps.ppt
novaprofk
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
Introduction to Artificial Intelligence.pptx
StarToon1
 
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
MATRIX_AMAN IRAWAN_20227479046.docbbbnnb
vanitafiani1
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
AI/ML Applications in Financial domain projects
Rituparna De
 
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 

Hadoop Integration into Data Warehousing Architectures

  • 1. Integrating Hadoop into Data Warehousing Architecture Where is the Wisdom? Lost in the Knowledge. Where is the Knowledge? Lost in the Information. T.S. Eliot © Humza Naseer, University of Melbourne 2014
  • 2. Outline Findings, Conclusion & Future Work Current Work: Hadoop Integration into Data Warehouse Environment Related Work: Trends in Data Warehouse Architecture Link Between Hadoop and Data Warehouse Introduction © Humza Naseer, University of Melbourne 2014 2
  • 3. Identify all possible enterprise data assets Select those assets that have actionable content and can be accessed Bring the data assets into a logically centralized “enterprise data warehouse” Expose those data assets most effectively for decision making (Kimball & Ross, 2013) Intro: The Data Warehouse Mission © Humza Naseer, University of Melbourne 2014 3
  • 4. Hadoop is an Ecosystem of products  Open source  Vendor distributions  Additional tools for development and administration Hadoop Benefits  Enables big data analytics  Supports advanced forms of analytics  Scales cost effectively  Extends a data warehouse environment Hadoop Limitations • Low latency queries • Ease of access • Data integration and integrity • Fine grained security Intro: Overview of Hadoop Unstructured Data Query Results HDFS Data Nodes Map Reduce © Humza Naseer, University of Melbourne 2014 4
  • 5. A data warehouse system fetches and unifies data from heterogeneous source systems into a centralized dimensional or normalized data repository (Rainardi, 2008) Data warehouse is not a tool or technology  It is a business process which unifies an enterprise through data (Eckerson, 2012) Hadoop a problem or an opportunity? Where Hadoop fits into data warehouse architecture? Link Between Hadoop and Data Warehouse © Humza Naseer, University of Melbourne 2014 5
  • 6. Traditional RDBMSs cannot handle  The new data types  Extended analytic processing  Terabytes/hour loading with immediate query access We want to use SQL, but we don’t want the RDBMS storage constraints The disruptive solution: Hadoop (Kimball & Ross, 2013) Why is Integration Happening? DB1 DB2 DB3 Transformation and Load Central DW BI App-1 BI App-2 BI App-3 Decision Making © Humza Naseer, University of Melbourne 2014 6
  • 7. Ponniah (2011) notes that selection of DW architecture is based on enterprise requirements. DW architecture has multiple architectural layers and components  Logical architecture  Physical architecture (Moss and Atre, 2013) DW architecture overlaps with data integration, business intelligence and enterprise data (Russom, 2014) Inmon vs Kimball dichotomy (Ariyachandra and Watson, 2010) Trends in Data Warehouse Architectures © Humza Naseer, University of Melbourne 2014 7
  • 8. Eckerson (2012) notes that reporting and analytics have different workload requirements Reporting is based on the entities and facts which are well known Advanced analytics empowers the discovery of new facts which are not well known Multi-platform unified data architecture  Includes enterprise data warehouse (EDW) and several other new data platforms which augment EDW (Russom, 2013) Hadoop Integration into data warehousing environment © Humza Naseer, University of Melbourne 2014 8
  • 9. Data Staging Data archiving Advanced analytics Multi-structured data Uses of Hadoop that Extend DW Architectures DB1 DB2 DB3 Transformation and Load EDW BI App-1 BI App-2 BI App-3 Decision Making © Humza Naseer, University of Melbourne 2014 9
  • 10. Analytics and reporting have different requirements for DW architectures Characterize the DW architecture by counting the number and types of workloads it supports Logical DW architecture must integrate multiple physical platforms Design of logical DW architecture must be compartmentalized Proposed logical architecture for new DW ecosystem (An Extension of Eckerson (2012) BI architecture) Findings © Humza Naseer, University of Melbourne 2014 10
  • 11. Enterprise Data WarehouseOperational System Operational System Operational Data Store Subject Area Data Marts BI Server Online Transaction Processing Systems (Relational Data) Event driven alerting environment Reporting/analysis Environment Logical Architecture of New DW Ecosystem DW-Centric Sandbox Web Data Machine Data Log files Legacy/External Data Replicated Sandbox In-memory BI Sandbox Hadoop Ecosystem Cluster (Non-relational Data) Exploration/discovery environment Non-relational Extract, transform and Load (Batch, real time or near real time) Power User Casual User QueryETLStreaming Top down architecture Bottom up architecture © Humza Naseer, University of Melbourne 2014 11
  • 12. BI Assessment Model Data Warehouse Ecosystem Data Marts Enterprise Data Warehouse Work Load Specific Data Platforms Workload Capacity Degree of Integration High High Low Low Degree of Standardization High Low © Humza Naseer, University of Melbourne 2014 12
  • 13. Hadoop enables new types of applications within DW environment Big data analytics, advanced analytics and discovery analytics Information exploration and augmenting a data warehouse Should be implemented in multi-platform DW environment Future work:  Conformed dimensions  BI maturity roadmap Conclusion © Humza Naseer, University of Melbourne 2014 13
  • 14. Questions © Humza Naseer, University of Melbourne 2014 14