EDM - E1 - Data Architecture and Modeling - Data Architecture v1.1
EDM - E1 - Data Architecture and Modeling - Data Architecture v1.1
Data Architecture
• No well-defined Ownership
• Highly Duplicated
• Disparate set of technologies for implementing the same processes in different departments
• The rate at which business requirements come into the system are far greater than the rate of
implementation by IT
• 1 IT :: M Business
• Often times, Data Architects are visibly required (especially through critical failures) and are
given high levels of responsibility, but without the resources or authority to match the
expectations
Data Governance (DG) Master Data Management (MDM) Data Security (DS)
• Achieve organizational • Identify and maintain a single, unified • Ensure that data is kept safe and is
alignment (people, along with view of reference data across the accessible to pertinent users and
processes and technologies) on enterprise. groups, internally and externally. It
the governing of data ensures access to data is suitably
management issues throughout controlled throughout the flow of data
the enterprise. in an enterprise, by setting up data
security policies and processes.
Data
Migration
Data Governance
Analytics
Da
• Data Stewardship
y
Data
ta
li t
• Data Privacy
Consolidation
ua
Se
• Data Visualization
c
• Data Analysis
ur
ta
it y
Da
• Data Discovery
• Data Profiling Data
Data Mining • Data Cleansing Synchronization
t
en
• Data Monitoring
em
• Data Modeling (Logical)
g
na
ERP • Data Modeling (Physical)
Ma
MD
ta
M
da
Data
• Database Administration
ta
CRM
Me
Data Architecture
BPM/SOA KM
SCM
• A data flow diagram can also be used for the visualization of data
processing (structured design).
Microsoft Visio A Windows diagramming tool which includes very basic DFD support (Images only,
does not record data flows)
SmartDraw A Windows diagramming tool with Yourdon and Coad process notations and Gane and
Sarson process notation
System Architect An enterprise architecture tool, supporting Coad/Yourdon, Gane & Sarson,
Ward/Mellor, and SSADM notations and techniques
DFDdeveloper An open source software application that allows Microsoft Office users to create
interactive leveled data flow diagrams and data dictionaries
• Overview
– http://web.cs.wpi.edu/~matt/courses/cs563/talks/datavis.html
• Modern Approaches
– http://www.smashingmagazine.com/2007/08/02/data-visualization-modern-approache
s/
• “The Digital Universe in 2007 stood at 281 Billion Gigabytes and with an annual
growth rate of almost 60%, it is projected to reach nearly 1.8 Zettabytes in 2011.”
• Causes:
– Digital Cameras
– Digital Surveillance Cameras
– Digital Television
– Better understanding of replication trends
– Increasing internet access in emerging countries
– Sensor-based applications
– Data Centers supporting cloud computing and social networks
• A person’s digital shadow – digital information generated about the average person
on a daily basis (financial records, mailing lists, surfing histories, images by
security cameras etc.) – now surpasses the amount of digital information
individuals actively create themselves (creating pictures, sending emails, digital
voice calls etc.)
• For structured data storage, mainly databases are used. Unstructured data
can be stored in proprietary systems or the latest trends in databases
allow storage of unstructured data as well
• Capacity planning for data storage takes into account existing data
volumes and future growth. Inadequate capacity planning can impact the
performance of systems
Repositories
ADW
(ETL/Reporting etc) Data
Store • DW / EDW: High
• Real Time /
operational Data volume Data
Storage Integration in batch
• Low latency loads
ODS EDW, DW, DM mode
• Simple-Average • Data Marts:
Operational Queries Subject Area
specific with
MDDB medium data
volumes
• Backups differ from archives in the sense that archives are the primary
copy of data and backups are a secondary copy of data
• RAID Levels
• RAID Performance
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Disk 1
Disk 1
Mirror
•Striping
•Distributed Parity
•Disk space = Disk Drives - 1
1 2 3 Parity
4 5 Parity 6
7 Parity 8 9
Parity 10 11 12
1 2 3 Parity
4 5 Parity 6
7 Parity 8 9
Parity 10 11 12
1 2 3 Parity
4 5 Parity 6
7 Parity 8 9
Parity 10 11 12
1 2 3 Parity
4 5 Parity 6
7 Parity 8 9
Parity 10 11 12
RAID-1
Protection
RAID-5
RAID-0
Cost
RAID-1
Protection
RAID-5
RAID-0
Performance
• RAID-1
– Best Protection
– Good Performance
– Most Expensive
• RAID-5
– Good Protection
– Worst Performance
– Least Expensive Fault Tolerant
• Data Modelers are responsible for designing the data model and they
communicate with functional teams to understand the business
requirements and technical teams to implement the database.
• A fact (measure) table contains measures (sales gross value, total units
sold) and dimension columns. These dimension columns are actually
foreign keys from the respective dimension tables.
Data is normalized and used for OLTP. Data is denormalized and used in
Optimized for OLTP processing datawarehouse and data mart. Optimized for
OLAP
Several tables and chains of relationships Few tables and fact tables are connected to
among them dimensional tables
Volatile (several updates) and time variant Non volatile and time invariant
• Models incomplete – not enough interface with business, or business is not clear
• ETL (3NF) vs. Reporting (Dimensional) – Clash of the Titans (Understanding & Skills)
• “It is easier to change the Physical Structure, because that is what we see”
• Not enough funding Modelers often overloaded Onus is on modelers to chase the
business (es)
• Different naming standards and tools when mergers happen Harder to integrate and
maintain
• Standards on data quality, security, metadata management, and master data management
differ in degrees of implementation across departments
• The rate at which businesses move is much faster than the rate at which data models get
June 26, 2024 46
updated
Data Architecture Agenda
• Data Architecture Overview
• Data Visualization
• Data Flow Diagrams
• Business Data
• Data Storage & Administration
• Data Storage vendors
• RAID
• Backup, Recovery, and Archival purging
• Data Modeling
• Corporate 3NF Modeling
• Dimensional Modeling
• Organizational Roles
• Functional Activities of a DBA
• References
2. Wikipedia – www.wikipedia.org
• Dimensional Modeling optimizes the database for data retrieval and analysis.
• Some of the decisions to be made during the design of a dimensional model are:
– The business processes to be selected for analysis of the subject area to be
modeled.
– Granularity of the fact tables.
– Dimensions and hierarchies to be identified for each fact table.
– Measures for the fact tables.
– Attributes for each dimension table.
– Pattern selection (Star schema, Snowflake schema or Starflake schema)