Module 1 Chapter 2
Module 1 Chapter 2
records
◼ Data cleaning and data integration techniques are
applied.
◼ Ensure consistency in naming conventions, encoding
time,location,supplier
3-D cuboids
time,item,location
time,item,supplier item,location,supplier
4-D(base) cuboid
time, item, location, supplier
branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
avg_sales city
state_or_province
Measures country
<dimension_name_first_time> in cube
<cube_name_first_time>
all all
Specification of hierarchies
◼ Schema hierarchy
day < {month <
quarter; week} < year
◼ Set_grouping hierarchy
{1..10} < inexpensive
Office Day
Month
July 31, 2023 Data Mining: Concepts and Techniques 22
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
TV
PC U.S.A
VCR
Country
sum
Canada
Mexico
sum
all
0-D(apex) cuboid
product date country
1-D cuboids
3-D(base) cuboid
product, date, country
◼ Visualization
◼ OLAP capabilities
◼ Interactive manipulation
July 31, 2023 Data Mining: Concepts and Techniques 25
Typical OLAP Operations
◼ Roll up (drill-up): summarize data
◼ by climbing up hierarchy or by dimension reduction
◼ Drill down (roll down): reverse of roll-up
◼ from higher level summary to lower level summary or
detailed data, or introducing new dimensions
◼ Slice and dice: project and select
◼ Pivot (rotate):
◼ reorient the cube, visualization, 3D to series of 2D planes
◼ Other operations
◼ drill across: involving (across) more than one fact table
◼ drill through: through the bottom level of the cube to its
back-end relational tables (using SQL)
ORDER
TRUCK
PRODUCT LINE
Time Product
ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP
CITY
SALES PERSON
COUNTRY
DISTRICT
REGION
DIVISION
Location Each circle is
called a footprint Promotion Organization
July 31, 2023 Data Mining: Concepts and Techniques 28
Chapter 3: Data Warehousing and
OLAP Technology: An Overview
Monitor
& OLAP Server
Other Metadata
sources Integrator
Analysis
Operational Extract Query
DBs Transform Data Serve Reports
Load
Refresh
Warehouse Data mining
Data Marts
materialized
July 31, 2023 Data Mining: Concepts and Techniques 33
Data Warehouse Development:
A Recommended Approach
Multi-Tier Data
Warehouse
Distributed
Data Marts
Enterprise
Data Data
Mart Mart Data
Warehouse
◼ Data transformation
◼ convert data from legacy or host format to warehouse
format
◼ Load
◼ sort, summarize, consolidate, compute views, check
integrity, and build indicies and partitions
◼ Refresh
◼ propagate the updates from the data sources to the
warehouse
July 31, 2023 Data Mining: Concepts and Techniques 35
Metadata Repository
◼ Meta data is the data defining warehouse objects. It stores:
◼ Description of the structure of the data warehouse
◼ schema, view, dimensions, hierarchies, derived data defn, data
mart locations and contents
◼ Operational meta-data
◼ data lineage (history of migrated data and transformation path),
currency of data (active, archived, or purged), monitoring
information (warehouse usage statistics, error reports, audit trails)
◼ The algorithms used for summarization
◼ The mapping from operational environment to the data warehouse
◼ Data related to system performance
◼ warehouse schema, view and derived data definitions
◼ Business data
◼ business terms and definitions, ownership of data, charging policies
July 31, 2023 Data Mining: Concepts and Techniques 36
OLAP Server Architectures
◼ Relational OLAP (ROLAP)
◼ Use relational or extended-relational DBMS to store and manage
warehouse data and OLAP middle ware
◼ Include optimization of DBMS backend, implementation of
aggregation navigation logic, and additional tools and services
◼ Greater scalability
◼ Multidimensional OLAP (MOLAP)
◼ Sparse array-based multidimensional storage engine
◼ Fast indexing to pre-computed summarized data
◼ Hybrid OLAP (HOLAP) (e.g., Microsoft SQLServer)
◼ Flexibility, e.g., low level: relational, high-level: array
◼ Specialized SQL servers (e.g., Redbricks)
◼ Specialized support for SQL queries over star/snowflake schemas
July 31, 2023 Data Mining: Concepts and Techniques 37
Chapter 3: Data Warehousing and
OLAP Technology: An Overview
threshold
◼ Avoid explosive growth of the cube
and product
◼ A join index on city maintains for each
data warehouses
◼ ODBC, OLEDB, Web accessing, service facilities,
Layer2
MDDB
MDDB
Meta Data
◼ Summary
July 31, 2023 Data Mining: Concepts and Techniques 49
Summary: Data Warehouse and OLAP Technology