SlideShare a Scribd company logo
Oracle: Data Warehouse Design
Characteristics of a Data Warehouse

  A data warehouse is a database designed for
   querying, reporting, and analysis.
  A data warehouse contains historical data
   derived from transaction data.
  Data warehouses separate analysis workload
   from transaction workload.
  A data warehouse is primarily
   an analytical tool.
Comparing OLTP and Data Warehouses
          OLTP                        Data Warehouse



   Many              Joins            Some


   Comparatively   Data accessed by   Large
   lower           queries            amount


   Normalized       Duplicated data   Denormalized
   DBMS                               DBMS

                   Derived data
   Rare            and                Common
                   aggregates
Data Warehouse Architectures

                                                                     Analysis
Operational
systems
                                  Metadata                Sales


                                                        Purchasing
                        Materialized
                                             Raw data
              Staging   views
              area                                                        Reporting
                                                          Inventory




 Flat files                                                       Data mining
Data Warehouse Design
• Key data warehouse design considerations:
  – Identify the specific data content.
  – Recognize the critical relationships within and
    between groups of data.
  – Define the system environment
    supporting your data warehouse.
  – Identify the required data
    transformations.
  – Calculate the frequency at which
    the data must be refreshed.
Logical Design
– A logical design is conceptual and
  abstract.
– Entity-relationship (ER) modeling
  is useful in identifying logical
  information requirements.
   • An entity represents a chunk of data.
   • The properties of entities are known as attributes.
   • The links between entities and attributes are known
     as relationships.
– Dimensional modeling is a specialized
  type of ER modeling useful in data warehouse
  design.
Oracle Warehouse Builder
– Oracle Database provides tools to implement
  the ETL process.
   • Oracle Warehouse Builder is a tool to help in this
     process.
– Oracle Warehouse Builder generates the
  following types of code:
   •   SQL data definition language (DDL) scripts
   •   PL/SQL programs
   •   SQL*Loader control files
   •   XML Processing Description Language (XPDL)
   •   ABAP code (used to extract data from SAP systems)
Data Warehousing Schemas
– Objects can be arranged in data warehousing
  schema models in a variety of ways:
   •   Star schema
   •   Snowflake schema
   •   Third normal form (3NF) schema
   •   Hybrid schemas
– The source data model and user
  requirements should steer the data
  warehouse schema.
– Implementation of the logical model may
  require changes to enable you to adapt it to
  your physical system.
Schema Characteristics
– Star schema
   • Characterized by one or more large fact tables and a
     number of much smaller dimension tables
   • Each dimension table joined to the fact table using a
     primary key to foreign key join
– Snowflake schema
   • Dimension data grouped into multiple tables instead
     of one large table
   • Increased number of dimension tables, requiring
     more foreign key joins
– Third normal form (3NF) schema
   • A classical relational-database model that minimizes
     data redundancy through normalization
Data Warehousing Objects
– Fact tables
   • Fact tables are the large tables that store business
     measurements.
– Dimension tables
   • A dimension is a structure composed of one or more
     hierarchies that categorizes data.
   • Unique identifiers are specified for one distinct
     record in a dimension table.
– Relationships
   • Relationships guarantee
     integrity of business
     information.
Fact Tables
– A fact table must be defined for each star schema.
– Fact tables are the large tables that store business
  measurements.
– A fact table contains either detail-level or
  aggregated facts.
– A fact table usually contains facts with the same
  level of aggregation.
– The primary key of the fact table is
  usually a composite key made up
  of all its foreign keys.
Dimensions and Hierarchies
                                   CUSTOMERS dimension
– A dimension is a structure       hierarchy (by level)
  composed of one or more
  hierarchies that categorizes data. REGION
– Dimensional attributes help to
  describe the dimensional value.         SUBREGION

– Dimension data is collected at the
  lowest level of detail and aggregatedCOUNTRY
  into higher level totals.
– Hierarchies are structures that use STATE
  ordered levels to organize data.
– In a hierarchy, each level is           CITY

  connected to the levels above and
  below it.                               CUSTOMER
Dimensions and Hierarchies

 PRODUCTS                             CUSTOMERS
 #prod_id         Unique identifier   #cust_id
                    Fact table        cust_last_name
                                       cust_city
                                       cust_state_province
   Relationship    SALES
                   cust_id
                   prod_id                     Hierarchy




 TIMES                                  CHANNELS
                  PROMOTIONS

Dimension table                        Dimension table
                  Dimension table
Physical Design
  Logical             Physical (Tablespaces)


Entities           Tables                      Indexes



                                               Materialized
Relationships          Integrity
                                               views
                       constraints
                     - Primary key
                     - Foreign key
Attributes           - Not null                Dimensions



Unique
identifiers        Columns
Data Warehouse Physical Structures

• Tables and partitioned tables
  – Partitioned tables enable you to split
    large data volumes into smaller,
    more manageable pieces.
  – Expect performance benefits from:
     • Partition pruning
     • Intelligent parallel processing
  – Compressed tables offer scaleup opportunities
    for read-only operations.
  – Table compression saves disk space.
Data Warehouse Physical Structures

  – Views:
     • Are tailored presentations of data contained in one
       or more tables or views
     • Do not require any space in the database
  – Materialized views:
     • Are query results that have been stored in advance
     • (Like indexes) are used transparently and improve
       performance
  – Integrity constraints:
     • Are used in data warehouses for query rewrite
  – Dimensions:
     • Are containers of logical relationships and do not
       require any space in the database
Managing Large Volumes of Data
• Work smarter in your data warehouse:
  –   Partitioning
  –   Bitmap indexes/Star transformation
  –   Data compression
  –   Query rewrite
• Work harder in your data warehouse:
  – Parallelism for all operations
       • DBA tasks, such as loading, index creation, table
         creation, data modification, backup and recovery
       • End-user operations, such as queries
       • Unbounded scalability: Real Application Clusters
I/O Performance in Data Warehouses

  – I/O is typically the primary determinant of data
    warehouse performance.
  – Data warehouse storage configurations should be
    chosen by I/O bandwidth, not storage capacity.
  – Every component of the I/O
    subsystem should provide
    enough bandwidth:
     • Disks
     • I/O channels
     • I/O adapters
  – In data warehouses, maximizing
    sequential I/O throughput is critical.
I/O Scalability
Parallel execution:
     – Reduces response time for data-intensive operations on large
       databases
     – Benefits systems with the following characteristics:
          • Multiprocessors, clusters, or massively parallel systems
          • Sufficient I/O bandwidth
          • Sufficient memory to support memory-intensive processes such
            as sorts, hashing, and I/O buffers

                                Query servers
                                                               Coordinator
 Data on disk          Scan                     Sort Q1

                       Scan                     Sort Q2
                                                               Dispatch
                                                               work
                       Scan                     Sort Q3

                       Scan                     Sort Q4
                     Scanners          Sorters (Aggregators)
I/O Scalability

• Automatic Storage Management (ASM)
  – Configuring storage for a DB depends on many
    variables:
     •   Which data to put on which disk
     •   Logical unit number (LUN) configurations
     •   DB types and workloads; data warehouse, OLTP, DSS
     •   Trade-offs between available options
  – ASM provides solutions to storage issues
    encountered in data warehouses.
I/O Scalability

• Automatic Storage Management: Overview
  – Portable and high-performance
    cluster file system                Application
  – Manages Oracle database files
  – Data spread across disks                Database
    to balance load                  File
  – Integrated mirroring across      system
                                                     ASM
    disks                            Volume
                                     manager
  – Solves many storage
    management challenges           Operating system
Visit more self help tutorials

• Pick a tutorial of your choice and browse
  through it at your own pace.
• The tutorials section is free, self-guiding and
  will not involve any additional support.
• Visit us at www.dataminingtools.net

More Related Content

What's hot (20)

PDF
Druid Adoption Tips and Tricks
Imply
 
PPT
1.4 data warehouse
Krish_ver2
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PPTX
OLAP v/s OLTP
ahsan irfan
 
PPTX
Building an Effective Data Warehouse Architecture
James Serra
 
PPTX
Introduction To Data Vault - DAMA Oregon 2012
Empowered Holdings, LLC
 
PPT
Crisp dm
akbkck
 
PPTX
Hadoop File system (HDFS)
Prashant Gupta
 
PPTX
What is big data?
David Wellman
 
PDF
Introduction to TensorFlow 2.0
Databricks
 
PDF
Word2vec algorithm
Andrew Koo
 
PPTX
Challenges in Building a Data Pipeline
Manish Kumar
 
PDF
Cloud Data Warehouses
Asis Mohanty
 
PPT
Kettle – Etl Tool
Dr Anjan Krishnamurthy
 
PDF
Making Apache Spark Better with Delta Lake
Databricks
 
PPTX
What is word2vec?
Traian Rebedea
 
PPTX
Analysing of big data using map reduce
Paladion Networks
 
PPTX
Introduction to natural language processing (NLP)
Alia Hamwi
 
PPT
Chap12
professorkarla
 
PDF
Apache NiFi Meetup - Princeton NJ 2016
Timothy Spann
 
Druid Adoption Tips and Tricks
Imply
 
1.4 data warehouse
Krish_ver2
 
Introduction to Stream Processing
Guido Schmutz
 
OLAP v/s OLTP
ahsan irfan
 
Building an Effective Data Warehouse Architecture
James Serra
 
Introduction To Data Vault - DAMA Oregon 2012
Empowered Holdings, LLC
 
Crisp dm
akbkck
 
Hadoop File system (HDFS)
Prashant Gupta
 
What is big data?
David Wellman
 
Introduction to TensorFlow 2.0
Databricks
 
Word2vec algorithm
Andrew Koo
 
Challenges in Building a Data Pipeline
Manish Kumar
 
Cloud Data Warehouses
Asis Mohanty
 
Kettle – Etl Tool
Dr Anjan Krishnamurthy
 
Making Apache Spark Better with Delta Lake
Databricks
 
What is word2vec?
Traian Rebedea
 
Analysing of big data using map reduce
Paladion Networks
 
Introduction to natural language processing (NLP)
Alia Hamwi
 
Apache NiFi Meetup - Princeton NJ 2016
Timothy Spann
 

Viewers also liked (20)

PPTX
Oracle Data Warehouse
DataminingTools Inc
 
PPTX
Oracle 11g data warehouse introdution
Aditya Trivedi
 
PPT
Module Owb Basics
Nicholas Goodman
 
PPT
Module Owb Process Flows
Nicholas Goodman
 
PPT
Module Owb Lifecycle
Nicholas Goodman
 
DOC
business analysis-Data warehousing
Dhilsath Fathima
 
PPTX
multiparty access control
Levin Sibi
 
PPTX
Multiparty Access Control For Online Social Networks : Model and Mechanisms.
Kiran K.V.S.
 
PDF
Data warehousing labs maunal
Education
 
PDF
Agile Data Warehouse Design for Big Data Presentation
Vishal Kumar
 
PPTX
LISP:Object System Lisp
DataminingTools Inc
 
PPTX
LISP: Scope and extent in lisp
DataminingTools Inc
 
ODP
Oratoria E RetóRica Latinas
lara
 
PPT
How To Make Pb J
spencer shanks
 
PPTX
SPSS: File Managment
DataminingTools Inc
 
PPT
Powerpoint paragraaf 5.3/5.4
guestaa9e6a
 
PDF
Bind How To
cntlinux
 
PPTX
BI: Open Source
DataminingTools Inc
 
PPTX
MS SQL SERVER: Microsoft sequence clustering and association rules
DataminingTools Inc
 
PPTX
Data Applied:Forecast
DataminingTools Inc
 
Oracle Data Warehouse
DataminingTools Inc
 
Oracle 11g data warehouse introdution
Aditya Trivedi
 
Module Owb Basics
Nicholas Goodman
 
Module Owb Process Flows
Nicholas Goodman
 
Module Owb Lifecycle
Nicholas Goodman
 
business analysis-Data warehousing
Dhilsath Fathima
 
multiparty access control
Levin Sibi
 
Multiparty Access Control For Online Social Networks : Model and Mechanisms.
Kiran K.V.S.
 
Data warehousing labs maunal
Education
 
Agile Data Warehouse Design for Big Data Presentation
Vishal Kumar
 
LISP:Object System Lisp
DataminingTools Inc
 
LISP: Scope and extent in lisp
DataminingTools Inc
 
Oratoria E RetóRica Latinas
lara
 
How To Make Pb J
spencer shanks
 
SPSS: File Managment
DataminingTools Inc
 
Powerpoint paragraaf 5.3/5.4
guestaa9e6a
 
Bind How To
cntlinux
 
BI: Open Source
DataminingTools Inc
 
MS SQL SERVER: Microsoft sequence clustering and association rules
DataminingTools Inc
 
Data Applied:Forecast
DataminingTools Inc
 
Ad

Similar to Oracle: DW Design (20)

PDF
Relational
dieover
 
PPTX
Oracle: Fundamental Of DW
DataminingTools Inc
 
PPTX
Oracle: Fundamental Of Dw
oracle content
 
PPT
Management information system database management
Online
 
PPTX
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
PPT
data warehousing
Tirath Mulani
 
PPT
Lecture3.ppt
ShaimaaMohamedGalal
 
PPTX
Data Science Machine Lerning Bigdat.pptx
Priyadarshini648418
 
PPT
The BI Sandbox
Craig Jordan
 
PDF
Prague data management meetup 2018-03-27
Martin Bém
 
PPTX
Data warehouse introduction
Murli Jha
 
PPTX
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
Barrett Peterson
 
PPTX
Lectures 9-HCE 311.pptx;parallel systems
emilymarimo4
 
PPT
Datawarehousing & DSS
Deepali Raut
 
KEY
Processing Big Data
cwensel
 
PDF
(Dbms) class 1 & 2 (Presentation)
Dr. Mazin Mohamed alkathiri
 
PPTX
Oracle Database 12c - Features for Big Data
Abishek V S
 
PPT
Computing 7
sufyanmaqsood
 
PPTX
Module 2.2 Introduction to NoSQL Databases.pptx
NiramayKolalle
 
Relational
dieover
 
Oracle: Fundamental Of DW
DataminingTools Inc
 
Oracle: Fundamental Of Dw
oracle content
 
Management information system database management
Online
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
data warehousing
Tirath Mulani
 
Lecture3.ppt
ShaimaaMohamedGalal
 
Data Science Machine Lerning Bigdat.pptx
Priyadarshini648418
 
The BI Sandbox
Craig Jordan
 
Prague data management meetup 2018-03-27
Martin Bém
 
Data warehouse introduction
Murli Jha
 
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
Barrett Peterson
 
Lectures 9-HCE 311.pptx;parallel systems
emilymarimo4
 
Datawarehousing & DSS
Deepali Raut
 
Processing Big Data
cwensel
 
(Dbms) class 1 & 2 (Presentation)
Dr. Mazin Mohamed alkathiri
 
Oracle Database 12c - Features for Big Data
Abishek V S
 
Computing 7
sufyanmaqsood
 
Module 2.2 Introduction to NoSQL Databases.pptx
NiramayKolalle
 
Ad

More from DataminingTools Inc (20)

PPTX
Terminology Machine Learning
DataminingTools Inc
 
PPTX
Techniques Machine Learning
DataminingTools Inc
 
PPTX
Machine learning Introduction
DataminingTools Inc
 
PPTX
Areas of machine leanring
DataminingTools Inc
 
PPTX
AI: Planning and AI
DataminingTools Inc
 
PPTX
AI: Logic in AI 2
DataminingTools Inc
 
PPTX
AI: Logic in AI
DataminingTools Inc
 
PPTX
AI: Learning in AI 2
DataminingTools Inc
 
PPTX
AI: Learning in AI
DataminingTools Inc
 
PPTX
AI: Introduction to artificial intelligence
DataminingTools Inc
 
PPTX
AI: Belief Networks
DataminingTools Inc
 
PPTX
AI: AI & Searching
DataminingTools Inc
 
PPTX
AI: AI & Problem Solving
DataminingTools Inc
 
PPTX
Data Mining: Text and web mining
DataminingTools Inc
 
PPTX
Data Mining: Outlier analysis
DataminingTools Inc
 
PPTX
Data Mining: Mining stream time series and sequence data
DataminingTools Inc
 
PPTX
Data Mining: Mining ,associations, and correlations
DataminingTools Inc
 
PPTX
Data Mining: Graph mining and social network analysis
DataminingTools Inc
 
PPTX
Data warehouse and olap technology
DataminingTools Inc
 
PPTX
Data Mining: Data processing
DataminingTools Inc
 
Terminology Machine Learning
DataminingTools Inc
 
Techniques Machine Learning
DataminingTools Inc
 
Machine learning Introduction
DataminingTools Inc
 
Areas of machine leanring
DataminingTools Inc
 
AI: Planning and AI
DataminingTools Inc
 
AI: Logic in AI 2
DataminingTools Inc
 
AI: Logic in AI
DataminingTools Inc
 
AI: Learning in AI 2
DataminingTools Inc
 
AI: Learning in AI
DataminingTools Inc
 
AI: Introduction to artificial intelligence
DataminingTools Inc
 
AI: Belief Networks
DataminingTools Inc
 
AI: AI & Searching
DataminingTools Inc
 
AI: AI & Problem Solving
DataminingTools Inc
 
Data Mining: Text and web mining
DataminingTools Inc
 
Data Mining: Outlier analysis
DataminingTools Inc
 
Data Mining: Mining stream time series and sequence data
DataminingTools Inc
 
Data Mining: Mining ,associations, and correlations
DataminingTools Inc
 
Data Mining: Graph mining and social network analysis
DataminingTools Inc
 
Data warehouse and olap technology
DataminingTools Inc
 
Data Mining: Data processing
DataminingTools Inc
 

Recently uploaded (20)

PDF
The Past, Present & Future of Kenya's Digital Transformation
Moses Kemibaro
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PDF
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
PDF
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
PDF
Productivity Management Software | Workstatus
Lovely Baghel
 
PDF
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
PPTX
Lecture 5 - Agentic AI and model context protocol.pptx
Dr. LAM Yat-fai (林日辉)
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PDF
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
PDF
Trading Volume Explained by CIFDAQ- Secret Of Market Trends
CIFDAQ
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PDF
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
PPTX
Top Managed Service Providers in Los Angeles
Captain IT
 
PDF
Generative AI in Healthcare: Benefits, Use Cases & Challenges
Lily Clark
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PPTX
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
PDF
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
PDF
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
The Past, Present & Future of Kenya's Digital Transformation
Moses Kemibaro
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
Productivity Management Software | Workstatus
Lovely Baghel
 
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
Lecture 5 - Agentic AI and model context protocol.pptx
Dr. LAM Yat-fai (林日辉)
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
Trading Volume Explained by CIFDAQ- Secret Of Market Trends
CIFDAQ
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
Top Managed Service Providers in Los Angeles
Captain IT
 
Generative AI in Healthcare: Benefits, Use Cases & Challenges
Lily Clark
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
Rethinking Security Operations - Modern SOC.pdf
Haris Chughtai
 
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 

Oracle: DW Design

  • 2. Characteristics of a Data Warehouse A data warehouse is a database designed for querying, reporting, and analysis. A data warehouse contains historical data derived from transaction data. Data warehouses separate analysis workload from transaction workload. A data warehouse is primarily an analytical tool.
  • 3. Comparing OLTP and Data Warehouses OLTP Data Warehouse Many Joins Some Comparatively Data accessed by Large lower queries amount Normalized Duplicated data Denormalized DBMS DBMS Derived data Rare and Common aggregates
  • 4. Data Warehouse Architectures Analysis Operational systems Metadata Sales Purchasing Materialized Raw data Staging views area Reporting Inventory Flat files Data mining
  • 5. Data Warehouse Design • Key data warehouse design considerations: – Identify the specific data content. – Recognize the critical relationships within and between groups of data. – Define the system environment supporting your data warehouse. – Identify the required data transformations. – Calculate the frequency at which the data must be refreshed.
  • 6. Logical Design – A logical design is conceptual and abstract. – Entity-relationship (ER) modeling is useful in identifying logical information requirements. • An entity represents a chunk of data. • The properties of entities are known as attributes. • The links between entities and attributes are known as relationships. – Dimensional modeling is a specialized type of ER modeling useful in data warehouse design.
  • 7. Oracle Warehouse Builder – Oracle Database provides tools to implement the ETL process. • Oracle Warehouse Builder is a tool to help in this process. – Oracle Warehouse Builder generates the following types of code: • SQL data definition language (DDL) scripts • PL/SQL programs • SQL*Loader control files • XML Processing Description Language (XPDL) • ABAP code (used to extract data from SAP systems)
  • 8. Data Warehousing Schemas – Objects can be arranged in data warehousing schema models in a variety of ways: • Star schema • Snowflake schema • Third normal form (3NF) schema • Hybrid schemas – The source data model and user requirements should steer the data warehouse schema. – Implementation of the logical model may require changes to enable you to adapt it to your physical system.
  • 9. Schema Characteristics – Star schema • Characterized by one or more large fact tables and a number of much smaller dimension tables • Each dimension table joined to the fact table using a primary key to foreign key join – Snowflake schema • Dimension data grouped into multiple tables instead of one large table • Increased number of dimension tables, requiring more foreign key joins – Third normal form (3NF) schema • A classical relational-database model that minimizes data redundancy through normalization
  • 10. Data Warehousing Objects – Fact tables • Fact tables are the large tables that store business measurements. – Dimension tables • A dimension is a structure composed of one or more hierarchies that categorizes data. • Unique identifiers are specified for one distinct record in a dimension table. – Relationships • Relationships guarantee integrity of business information.
  • 11. Fact Tables – A fact table must be defined for each star schema. – Fact tables are the large tables that store business measurements. – A fact table contains either detail-level or aggregated facts. – A fact table usually contains facts with the same level of aggregation. – The primary key of the fact table is usually a composite key made up of all its foreign keys.
  • 12. Dimensions and Hierarchies CUSTOMERS dimension – A dimension is a structure hierarchy (by level) composed of one or more hierarchies that categorizes data. REGION – Dimensional attributes help to describe the dimensional value. SUBREGION – Dimension data is collected at the lowest level of detail and aggregatedCOUNTRY into higher level totals. – Hierarchies are structures that use STATE ordered levels to organize data. – In a hierarchy, each level is CITY connected to the levels above and below it. CUSTOMER
  • 13. Dimensions and Hierarchies PRODUCTS CUSTOMERS #prod_id Unique identifier #cust_id Fact table cust_last_name cust_city cust_state_province Relationship SALES cust_id prod_id Hierarchy TIMES CHANNELS PROMOTIONS Dimension table Dimension table Dimension table
  • 14. Physical Design Logical Physical (Tablespaces) Entities Tables Indexes Materialized Relationships Integrity views constraints - Primary key - Foreign key Attributes - Not null Dimensions Unique identifiers Columns
  • 15. Data Warehouse Physical Structures • Tables and partitioned tables – Partitioned tables enable you to split large data volumes into smaller, more manageable pieces. – Expect performance benefits from: • Partition pruning • Intelligent parallel processing – Compressed tables offer scaleup opportunities for read-only operations. – Table compression saves disk space.
  • 16. Data Warehouse Physical Structures – Views: • Are tailored presentations of data contained in one or more tables or views • Do not require any space in the database – Materialized views: • Are query results that have been stored in advance • (Like indexes) are used transparently and improve performance – Integrity constraints: • Are used in data warehouses for query rewrite – Dimensions: • Are containers of logical relationships and do not require any space in the database
  • 17. Managing Large Volumes of Data • Work smarter in your data warehouse: – Partitioning – Bitmap indexes/Star transformation – Data compression – Query rewrite • Work harder in your data warehouse: – Parallelism for all operations • DBA tasks, such as loading, index creation, table creation, data modification, backup and recovery • End-user operations, such as queries • Unbounded scalability: Real Application Clusters
  • 18. I/O Performance in Data Warehouses – I/O is typically the primary determinant of data warehouse performance. – Data warehouse storage configurations should be chosen by I/O bandwidth, not storage capacity. – Every component of the I/O subsystem should provide enough bandwidth: • Disks • I/O channels • I/O adapters – In data warehouses, maximizing sequential I/O throughput is critical.
  • 19. I/O Scalability Parallel execution: – Reduces response time for data-intensive operations on large databases – Benefits systems with the following characteristics: • Multiprocessors, clusters, or massively parallel systems • Sufficient I/O bandwidth • Sufficient memory to support memory-intensive processes such as sorts, hashing, and I/O buffers Query servers Coordinator Data on disk Scan Sort Q1 Scan Sort Q2 Dispatch work Scan Sort Q3 Scan Sort Q4 Scanners Sorters (Aggregators)
  • 20. I/O Scalability • Automatic Storage Management (ASM) – Configuring storage for a DB depends on many variables: • Which data to put on which disk • Logical unit number (LUN) configurations • DB types and workloads; data warehouse, OLTP, DSS • Trade-offs between available options – ASM provides solutions to storage issues encountered in data warehouses.
  • 21. I/O Scalability • Automatic Storage Management: Overview – Portable and high-performance cluster file system Application – Manages Oracle database files – Data spread across disks Database to balance load File – Integrated mirroring across system ASM disks Volume manager – Solves many storage management challenges Operating system
  • 22. Visit more self help tutorials • Pick a tutorial of your choice and browse through it at your own pace. • The tutorials section is free, self-guiding and will not involve any additional support. • Visit us at www.dataminingtools.net