BIA 5000 Introduction To Analytics - Lesson 8
BIA 5000 Introduction To Analytics - Lesson 8
TO ANALYTICS
2021 - 2022
LESSON 8.
BUSINESS INTELLIGENCE ARCHITECTURE
(PART I)
Learning Objectives
https://dama.org/sites/default/files/download/DAMA-DMBOK2-Framework-V2-20140317-FINAL.pdf
Data
Wrangling
Data warehouse:
Any database or file
or collection of files
that is used to store
integrated data that
would be then
consumed by BI and
analytics
Integrated Data is gathered and made consistent from one or more source
systems
Data mart:
A subset of a data
warehouse that is
usually oriented to a
business group or
process rather than
enterprise-wide views
Scope Data about the whole enterprise Data about a specific area (by business
process or department)
Sources Multiple source systems One or a small number of sources
Objective Provide integrated environment with a full Provide information for specific purpose
picture of the enterprise or project
Size Very large; from 100GB to many TB Smaller than DW, usually below 100GB
Granularity Time-variant; non-volatile data; detail level May be consolidated and aggregated
for specific purpose; less detail
Cost Very expensive More affordable
Independent Data Marts
https://www.ewsolutions.com/migrating-independent-data-marts/
Independent Data Mart Challenges
Redundant Data Each of the independent data marts requires its own, typically
duplicated copy of the detailed corporate data.
One location All the data is loaded into one location for ease of access
Current data Usually contains current data (defined period), not historic data
Federated Data Warehouse
Federated Data
Warehouse:
A collection of Data
Warehouses that
conform to the same
logical model but may
be separated physically
for better performance
or business needs e.g.
• By region or division
• By business function
Textbook Chapter 6 Figure 6.5
Accidental Data Architecture
Accidental Data
Architecture :
A collection of data
storage, warehousing
and analytics solutions
developed with lack of
consideration for
enterprise architecture
and data integration.
Hub-and-Spoke
Architecture :
A central data hub or
data warehouse that
feeds multiple data
marts for BI and
analytical purposes
Stage: Gather data from systems of Staging area (a.k.a. landing area)
record
Refine data for analytical use: further Second and subsequent tiers of sub-data marts
wrangle subsets of data for particular BI
DATA LATENCY
Real-time analytics
What is latency?
Latency
Latency: delay; the time it takes for a message or a packet of
information to move from one point to another.
Data latency: the time taken to collect and store the data
Analysis latency: the time taken to analyze the data and turn
it into actionable information
https://www.researchgate.net/figure/Response-time-latency-30_fig3_312145765
Data Latency Levels
Category Description Examples
Near-time Information is uploaded at set intervals rather Stock market price updates
(a.k.a. near than instantaneously, aiming to be as close to Social media monitoring
real-time) real-time as needed (“good enough” for
Email delivery
business needs)
If you think of a Data Mart as a store of bottled water, cleansed and packaged and
structured for easy consumption, the Data Lake is a large body of water in a more natural
state. The contents of the Data Lake stream in from a source to fill the lake, and various
users of the lake can come to examine, dive in, or take samples.“
James Dixon (Pentaho)
What’s a Data Lake?
Data Lake:
https://www.talend.com/resources/data-lake-vs-data-warehouse/