Data Warehousing Techniques and Standards
Data Warehousing Techniques and Standards
and Standards
Presented by:
Davin Flatten
Systems Analyst
MassSAFE
University of Massachusetts
@
Introduction
• The definition of a data warehouse
– What they are
– What they are not
• The star schema
– Dimensions
– Facts
• RDBMS requirements for warehousing
– ETL ( Extraction, Transformation, Loading)
– ROLAP ( Relational Online Application Processing )
• Keys to success
@
What a data warehouse is not
• Not a OLTP ( Online Transactional
Processing ) system
• Not necessarily a physical place
• Not a single project with an end
• Not a single product or application
@
What a data warehouse is
• Modeling technique specific to
analysis
• Bill Inmon’s four characteristics of a
data warehouse:
– Subject based design
– Integrated with your data
– Nonvolatile ‘picture’ of your states data
– Time variant view of your data
@
Where does the warehouse fit?
@
Why have a data
warehouse?
• Supplements transportation data
systems
• Avoids conflicting resources
• Increases your organizations
understanding of your data
• Most importantly…
@
It completes the data life cycle.
@
The Star Schema
• Key concept for data warehousing
• Modeling technique that simplifies joins
and tables
• Organizes data into a format that is
easy for business users to understand
• Allows application developers to
standardize ad-hoc queries
@
Elements of a Star Schema
• Dimension Tables
– Easy to understand groupings of subject areas
– Can be hierarchal used to drill down
– Denormalized, decoded, and cleaned set of
descriptive data elements
• Fact Tables
– Contains foreign keys referencing dimension
records
– Contain either additive or semi-additive
measures for analysis
@
Sample Citation Schema
@
The Dimension Table
• Each record
contains a single
town (grain)
• The associated
information is
denormalized and
hierarchal
• All values are
decoded in plain
language
@
The Fact Table
• Each record
contains a single
violation (grain)
• Each dimension is
reference with a
foreign key
• Measures are
provided for fines
and violations
@
The Star Query
• Regardless of
subject matter, the
same type of query
can be used
• Results can be
easily read and
used by analysts
• No complicated
outer-joins, sub-
selects, or other
complex SQL
@
Selecting a Relational Database
Management System (RDBMS)
• ETL (Extraction, Transformation, Loading)
– Ability to create stored procedures
– Job scheduler
– Logging and security tracking
• ROLAP
– Optimizer capable of performing star queries
– Table partitioning across time
– Bitmap index capabilities
@
Why we use Oracle for our
warehouse
• Table partition pruning
• Star query optimizer hint
• Bitmap indexes
• PL/SQL stored procedures
• Transportable table spaces
• Query rewrite
• Materialized views
• Job scheduler
@
Keys to a successful
Warehousing project
• Identified and involved warehouse
users
• Strong and committed leadership
• Diversified project team
• Established partnerships with all key
source data holders
• Incremental project plan the produces
fast results
• Correct design philosophy