SlideShare a Scribd company logo
Data Warehousing
Information Architects
Data Warehousing
• What is a data warehouse?
• A multi-dimensional data model
• Data warehouse architecture
• Data warehouse Implementation
Data Warehousing
Data warehousing is a process, not a product, for assembling
and managing data from various sources for the purpose of
gaining a single detailed view of part or all of a business. This
single view is the data warehouse.
On-line Analytical Processing (OLAP) is a technique used for
providing management decision support using historical and
summarized data that is consolidated in the data warehouse.
Data Warehousing
Most database systems continue to grow but a data warehouse
grows at a slower rate.
User updates to a data warehouse are usually forbidden,
updates must come from the underlying databases to maintain
consistency.
Data Warehousing
To speed up OLAP Queries, a warehouse contains summarized
and consolidated information representing materialized
aggregate views of the enterprise data from a number of
databases.
A warehouse stores data while OLAP derives strategic
information from it.
Data warehouse may be used to provide an enterprise memory
which operational data does not provide.
Data Warehousing
Warehouse usually contains information over time helping
analysis of trends
A data warehouse is repackaging information to support
business decision making
The aim in data warehousing may be to generate new revenue
by selling the repackaged information
A definition
A data warehouse is a subject-oriented, integrated,
time-variant, and non-volatile collection of data in
support of management’s decision making process.
(W.H. Inmon)
Subject-oriented
• Organized around major subjects such as students, degree,
country
• Focusing on the modeling and analysis of data for decision
makers, not on daily operations
• Providing a simple and concise view around particular subject
issues by excluding data that are not useful in the decision
support process
Integrated
• May be constructed by integrating multiple data sources e.g.
multiple databases
• Data cleaning and data integration techniques are applied to
ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
Time Variant
• Long time horizon for data warehouse, significantly longer
than that of operational system
• Operational database : current value data
• Data warehouse data : provide information from historical
perspective
• Every key structure in the data warehouse contains an
element of time, explicitly or implicitly
• Operational data may or may not contain time element
Non-volatile
• A physically separate store of data transformed from the
operational environment
• No update of data
• Does not require transaction processing, recovery and
concurrency control mechanisms
• Requires only two operations in data accessing : initial loading
of data and access of data
Data Warehouse Process
• Define the architecture, do capacity planning, select hardware
and software
• Design the warehouse schema and the views
• Design the physical data structures
• Design data extraction, cleaning, transformation, load and
refresh software
• Populate the repository with data and software
• Design and implement end-user application
Star Schema
•A star schema consists of one central fact table and several
denormalized dimension tables.
•The measures of interest for OLAP are stored in the fact table
(e.g. Dollar Amount, Units in the table SALES).
• For each dimension of the multidimensional model there exists
a dimension table (e.g. Geography, Product, Time, Account)
with all the levels of aggregation and the extra properties of
these levels.
Star Schema
SALES
Geography Code
Time Code
Account Code
Product Code
Dollar Amount
Units
Geography
Geography Code
Region Code
Region Manager
State Code
City Code
.....
Product
Product Code
Product Name
Brand Code
Brand Name
Prod. Line Code
Prod. Line Name
Time
Time Code
Quarter Code
Quarter Name
Month Code
Month Name
Date
Account
Account Code
KeyAccount Code
KeyAccountName
Account Name
Account Type
Account Market
Snowflake Schema
• The normalized version of the star schema
• Explicit treatment of dimension hierarchies (each level has its own
table)
• Easier to maintain, slower in query answering
Snowflake Schema
SALES
Postal Code
Time Code
Account Code
Product Code
Dollar Amount
Units
Time
Time Code
Quarter Code
Month Code
Quarter
Quarter Code
QuarterName
Month
Month Code
Month Name
Account
Account Code
KeyAccount
Code
Account
attributes
Account Code
AccountName
KeyAccount
KeyAcc Code
KeyAcc Name
Geography
Postal Code
Region Code
State Code
City Code
Region
Region Code
Region Mgr
State
State Code
State Name
City
City Code
City Name
Product
Product Code
Prod Line Code
Brand Code
Product
Product Code
ProductName
Brand
Brand Code
Brand Name
ProdLine
ProdLineCode
ProdLineName
Data Warehouse Design
The E-R Model approach which consists of entities and
relationships is not suitable for designing a schema for a
warehouse. Four steps dimensional design process
• Business Process
• Grain
• Dimensions
• Facts
Business Process
• First step is to decide what business process(es) to model
• A process is a natural business activity performed in the
organization
• Business Processes include raw materials purchasing, orders,
shipments, invoicing, inventory and general ledgers
Grain
• Granularity means what level of data detail should be made
available in the dimensional model
• Preferably develop model for the most atomic information
captured by a business process
• Atomic data is the most detailed information collected, such
data cannot be subdivided further
Dimensions
• Dimension tables are the entry points into the fact tables
• Robust attributes delivers robust analytic slicing and dicing
capabilities
•A carefully grain statement determines the primary
dimensionality of the fact table
• More dimensions can be added at later stage if doesn't affect
granularity
• Levels can be defined in dimensions
• Dimensions i.e. date, product, store etc
• Surrogate keys are used to join dimensions and fact tables
Facts
• A row in a fact table corresponds to a measurement
• All the measurements in a fact table must be at the same
grain
•Fact contains two type of attributes
• aggregate numeric values i.e. sales quantity, sales
amount etc
• Many-to-many relationships between dimensions
• Fact contains history data
Examples
• Retail Sales
• Inventory
• Procurement
Retail Sales
Date Dimension
Date key (PK)
Date
Day of week
Calendar week ending date
Calendar Month
Calendar year-month
Calendar quarter
… and more
Store Dimension
Store key (PK)
Store Name
Store Number
Store District
Store Region
… and more
Product Dimension
Product key (PK)
Product Description
SKU Number
Brand Description
Subcategory Description
Department Description
Package Type
… and more
Promotion Dimension
Promotion key (PK)
Promotion Name
Promotion Media Type
Promotion Begin Date
Promotion End Date
… and more
Retail Sales Transaction Fact
Date key (FK)
Product key (FK)
Store Key (FK)
Promotion key (FK)
Transaction number (DD)
Sales quantity
Sales dollar amount
Cost dollar amount
Gross profit dollar amount
Inventory
Date Dimension
Warehouse Dimension
Warehouse key (PK)
Warehouse Name
Warehouse Address
Warehouse City
Warehouse State
Warehouse Zip
Warehouse Zone
… and more
Product Dimension
Inventory Transaction Type
Dimension
Inventory Transaction Type key (PK)
Inventory Transaction Type Description
Inventory Transaction Type Group
Inventory Transaction Fact
Date key (FK)
Product key (FK)
Warehouse Key (FK)
Vendor key (FK)
Inventory Transaction Type key (FK)
Inventory Transaction Dollar Amount
Vendor Dimension
Procurement (Single Transaction)
Date Dimension
Vendor Dimension
Vendor key (PK)
Vendor Name
Vendor Street Address
Vendor City
Vendor Zip
Vendor State/Province
Vendor Country
Vendor Status
… and more
Product Dimension
Procurement Transaction Type
Dimension
Procurement Transaction key (PK)
Procurement Transaction Description
Procurement Transaction Category
Procurement Transaction Fact
Procurement Transaction Date key (FK)
Product key (FK)
Vendor Key (FK)
Contract Terms key (FK)
Procurement Transaction Type key (FK)
Contract Number (DD)
Procurement Transaction Quantity
Procurement Transaction Dollar Amount
Vendor Dimension
Procurement (Multiple Transaction)
Date Dimension
Vendor Dimension
Product Dimension
Received Condition Dimension
Purchase Requisition Fact
Requisition Date key (FK)
Requested Key (FK)
Product key (FK)
Vendor Key (FK)
Contract Terms key (FK)
Requested By Key (FK)
Contract Number (DD)
Purchase Requisition Number (DD)
Purchase Requisition Quantity
Purchase Requisition Amount
Contract Terms DimensionEmployee Dimension
Discount Taken Dimension
Purchase Order Fact
Requisition Date key (FK)
Requested Date Key (FK)
Purchase Order Date Key (FK)
Product key (FK)
Vendor Key (FK)
Contract Terms key (FK)
Requested By Key (FK)
Purchased Agent Key (FK)
Contract Number (DD)
Purchase Requisition Number (DD)
Purchase Order Number (DD)
Purchase Order Quantity
Purchase Order Amount
Shipping Notice Fact
Warehouse Receipts Fact
Vendor Payment Fact
Data Warehouse Architecture
•Generic Two-Level Architecture
•Independent Data Mart
•Dependent Data Mart and Operational Data Store
•Logical Data Mart and @ctive Warehouse
Generic Two Level Architecture
Independent Data Marts
Dependent Data Mart and Operational Data Store
Logical Data Mart and @ctive Warehouse
Data Warehouses Vs Data Marts
Data Mart
Department
Single-subject
Few
< 100 GB
Months
Data Mart
Data
Warehouse
Property
Scope
Subjects
Data Source
Size (typical)
Implementation time
Data Warehouse
Enterprise
Multiple
Many
100 GB to > 1 TB
Months to years
The ETL Process
• Capture
• Scrub or data cleansing
• Transform
• Load and Index
Static extractStatic extract = capturing a
snapshot of the source data at
a point in time
Incremental extractIncremental extract =
capturing changes that have
occurred since the last static
extract
Capture = extract…obtaining a snapshot
of a chosen subset of the source data for
loading into the data warehouse
Extraction
Scrub = cleanse…uses pattern
recognition and AI techniques to
upgrade data quality
Fixing errors:Fixing errors: misspellings,
erroneous dates, incorrect field usage,
mismatched addresses, missing data,
duplicate data, inconsistencies
Also:Also: decoding, reformatting, time
stamping, conversion, key
generation, merging, error
detection/logging, locating missing
Cleansing
Transform = convert data from format
of operational system to format of data
warehouse
Record-level:Record-level:
Selection – data partitioning
Joining – data combining
Aggregation – data summarization
Field-level:Field-level:
single-field – from one field to one field
multi-field – from many fields to one, or
one field to many
Transformation
In general – some transformation function
translates data from old form to new form
Algorithmic transformation uses a
formula or logical expression
Table lookup –
another
approach
Single-Field Transformation
M:1 –from many source
fields to one target field
1:M –from one
source field to
many target
fields
Multi-Field Transformation
Load/Index= place transformed data
into the warehouse and create indexes
Refresh mode:Refresh mode: bulk rewriting of
target data at periodic intervals
Update mode:Update mode: only changes in
source data are written to data
warehouse
Loading and Indexing
 The use of a set of graphical tools that provides users with
multidimensional views of their data and allows them to analyze the
data using simple windowing techniques
 Relational OLAP (ROLAP)
– Traditional relational representation
 Multidimensional OLAP (MOLAP)
– CubeCube structure
 OLAP Operations
– Cube slicing – come up with 2-D view of data
– Drill-down – going from summary to more detailed views
On-Line Analytical Processing (OLAP)
Slicing a Data Cube
Summary report
Drill-down with
color added
Drill Down Example
Indexes
• B-Tree Indexes
• Bitmap Indexes
B-Tree Indexes
B-tree indexes
– branch blocks or upper level blocks point to the corresponding
lower-level blocks
– leaf blocks contain the Oracle ROWID that points at the location
of the actual row the leaf refers to
Why is it so popular in Oracle products?
– simplicity
– easy to maintain
– high retrieval speed of highly selective column values (high
cardinality)
– the size of the table has little or no impact on the speed with
which B-tree indexed data can be fetched
Where does it work best?
select ... where colA = ‘ABC’;
select ... where colA between ‘A12’ and ‘R45’
Bitmap index
 for columns with very few unique values (low cardinality)
 built for one column at a time
 stream of bits: each bit relates to a column value in a single
row of table
Bitmap index
create bitmap index person_region on person (region);
Row Region North bitmap East bitmap West bitmap South bitmap
1 North 1 0 0 0
2 East 0 1 0 0
3 West 0 0 1 0
4 West 0 0 1 0
5 South 0 0 0 1
6 North 1 0 0 0
When to use it
 tables that have no or little insert/update are good candidates (static
data in warehouse)
 columns that have low cardinality are good candidates (if the
cardinality of a column is <= 0.1 % that the column is ideal candidate,
consider also 0.2% – 1%)
Bitmap vs. the B-tree
Unique col. Values Card. (%) B-tree space Bitmap space
500,000 50.00 15.29 12.35
100,000 10.00 15.21 5.25
10,000 1.00 14.34 2.99
100 0.01 13.40 1.38
5 <0.01 13.40 0.78
Partitions
The space allocated to tables can be divided several parts
called partitions
– In this case, we mean horizontal partitioning, or dividing up
the table by row, not columns
– Examples of partitioning strategies include
 by week / month / year
 by the first few letters in a name
 by a active / inactive
 by commonly-used WHERE criteria
Partition Example - By Range
We could create this customer table with a command looking
something like:
CREATE TABLE customer (
name VARCHAR2(30)
…)
PARTITION BY RANGE (name)
(PARTITION p1 VALUES LESS THAN ('I'),
PARTITION p2 VALUES LESS THAN ('S'),
PARTITION p3 VALUES LESS THAN
(MAXVALUE)) ;
Partition Example - By Range
 When we use the BY RANGE method for partitioning tables, we
only specify the maximum value of each column.
– The minimum value for each column is calculated by Oracle
 Note the use of MAXVALUE, which tells Oracle that all other
customers go into partition p3.
Partition Example - By Range
We can assign partitions to different tablespaces:
… PARTITION p1 VALUES LESS THAN 'I'
TABLESPACE tbsp1, …
This allows us to segregate the table across more than one disk
drive, for example
Partition Example - By Hash
We could also have partitioned the customer table by using
hashing
CREATE TABLE customer ( … )
PARTITION BY HASH (name)
PARTITIONS 3
STORE IN (p1, p2, p3) ;
Partition Example - By Hash
 In this case, we have told Oracle to create three partitions named
p1, p2, and p3
 Oracle will use the customers name to hash into one of these three
partitions.
 One disadvantage to this approach is that it might be difficult to
cluster customers logically for searching / scanning
Partition Example - By List
Finally, we can specify a list of values for each partition:
CREATE TABLE customer ( … )
PARTITION BY LIST (state)
(PARTITION p1 values ('AZ','CA'),
PARTITION p2 values ('WA', 'MO',
'OR', 'NV'),
PARTITION p3 values ('') ;
Modifying Partitions
 The ALTER TABLE command can be used to manage individual
partitions:
– adding, modifying, or dropping
– exchanging
– moving
– renaming
– splitting or truncating
Querying Partitions
 The SELECT command can be used to query specific partitions
SELECT …
FROM customer PARTITION (p1)
… ;
 The CBO is also capable of determining whether a query
should be made against a subset of partitions in a table
Question
&
Answers
Ad

More Related Content

What's hot (20)

The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
Matthew Dennis
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
pcherukumalla
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform Load
ABDUL KHALIQ
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
uncleRhyme
 
Distributed database
Distributed databaseDistributed database
Distributed database
sanjay joshi
 
Data warehouse
Data warehouseData warehouse
Data warehouse
krishna kumar singh
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentation
David Rice
 
Data cubes
Data cubesData cubes
Data cubes
Mohammed
 
How to track and improve Customer Experience with LEO CDP
How to track and improve Customer Experience with LEO CDPHow to track and improve Customer Experience with LEO CDP
How to track and improve Customer Experience with LEO CDP
Trieu Nguyen
 
Auditing Data Access in SQL Server
Auditing Data Access in SQL ServerAuditing Data Access in SQL Server
Auditing Data Access in SQL Server
Antonios Chatzipavlis
 
ETL Process
ETL ProcessETL Process
ETL Process
Rohin Rangnekar
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
obieefans
 
Decision trees
Decision treesDecision trees
Decision trees
Jagjit Wilku
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
idnats
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
CloverDX (formerly known as CloverETL)
 
Sybase ASE 15.7- Two Case Studies of Successful Migration
Sybase ASE 15.7- Two Case Studies of Successful Migration Sybase ASE 15.7- Two Case Studies of Successful Migration
Sybase ASE 15.7- Two Case Studies of Successful Migration
SAP Technology
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
FellowBuddy.com
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
Informatica slides
Informatica slidesInformatica slides
Informatica slides
sureshpaladi12
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
Matthew Dennis
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
pcherukumalla
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform Load
ABDUL KHALIQ
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
uncleRhyme
 
Distributed database
Distributed databaseDistributed database
Distributed database
sanjay joshi
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentation
David Rice
 
How to track and improve Customer Experience with LEO CDP
How to track and improve Customer Experience with LEO CDPHow to track and improve Customer Experience with LEO CDP
How to track and improve Customer Experience with LEO CDP
Trieu Nguyen
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
obieefans
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
idnats
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 
Sybase ASE 15.7- Two Case Studies of Successful Migration
Sybase ASE 15.7- Two Case Studies of Successful Migration Sybase ASE 15.7- Two Case Studies of Successful Migration
Sybase ASE 15.7- Two Case Studies of Successful Migration
SAP Technology
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
FellowBuddy.com
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 

Similar to Intro to datawarehouse dev 1.0 (20)

Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Gurpreet Singh Sachdeva
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
InformaticaTrainingClasses
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
ashok kumar
 
Dimensional Modelling
Dimensional ModellingDimensional Modelling
Dimensional Modelling
Prithwis Mukerjee
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
Kiran kumar
 
Introduction to Datawarehousing.
Introduction to Datawarehousing.Introduction to Datawarehousing.
Introduction to Datawarehousing.
Chetan Gadodia
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
tafosepsdfasg
 
Business Intelligence and OLAP Practice
Business Intelligence and OLAP PracticeBusiness Intelligence and OLAP Practice
Business Intelligence and OLAP Practice
Tatiana Ivanova
 
Dimensional Modeling Concepts_Nishant.ppt
Dimensional Modeling Concepts_Nishant.pptDimensional Modeling Concepts_Nishant.ppt
Dimensional Modeling Concepts_Nishant.ppt
nishant523869
 
3dw
3dw3dw
3dw
Kumanan Kadhirvelu
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
kiran14360
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligence
Ahsan Kabir
 
Analytics 101
Analytics 101Analytics 101
Analytics 101
Sujeevan Nagarajah
 
Data Warehousing for students educationpptx
Data Warehousing for students educationpptxData Warehousing for students educationpptx
Data Warehousing for students educationpptx
jainyshah20
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
Quontra Solutions
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
Dhiren Gala
 
The Data Engineering Guide 101 - GDGoC NUML X Bytewise
The Data Engineering Guide 101 - GDGoC NUML X BytewiseThe Data Engineering Guide 101 - GDGoC NUML X Bytewise
The Data Engineering Guide 101 - GDGoC NUML X Bytewise
gdscnuml
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
Vibrant Technologies & Computers
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
ganblues
 
3dw
3dw3dw
3dw
umavipplow
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
InformaticaTrainingClasses
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
ashok kumar
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
Kiran kumar
 
Introduction to Datawarehousing.
Introduction to Datawarehousing.Introduction to Datawarehousing.
Introduction to Datawarehousing.
Chetan Gadodia
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
tafosepsdfasg
 
Business Intelligence and OLAP Practice
Business Intelligence and OLAP PracticeBusiness Intelligence and OLAP Practice
Business Intelligence and OLAP Practice
Tatiana Ivanova
 
Dimensional Modeling Concepts_Nishant.ppt
Dimensional Modeling Concepts_Nishant.pptDimensional Modeling Concepts_Nishant.ppt
Dimensional Modeling Concepts_Nishant.ppt
nishant523869
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
kiran14360
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligence
Ahsan Kabir
 
Data Warehousing for students educationpptx
Data Warehousing for students educationpptxData Warehousing for students educationpptx
Data Warehousing for students educationpptx
jainyshah20
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
Quontra Solutions
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
Dhiren Gala
 
The Data Engineering Guide 101 - GDGoC NUML X Bytewise
The Data Engineering Guide 101 - GDGoC NUML X BytewiseThe Data Engineering Guide 101 - GDGoC NUML X Bytewise
The Data Engineering Guide 101 - GDGoC NUML X Bytewise
gdscnuml
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
ganblues
 
Ad

Recently uploaded (20)

Skin_Glands_Structure_Secretion _Control
Skin_Glands_Structure_Secretion _ControlSkin_Glands_Structure_Secretion _Control
Skin_Glands_Structure_Secretion _Control
muralinath2
 
Polymerase Chain Reaction (PCR).Poer Pint
Polymerase Chain Reaction (PCR).Poer PintPolymerase Chain Reaction (PCR).Poer Pint
Polymerase Chain Reaction (PCR).Poer Pint
Dr Showkat Ahmad Wani
 
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdfBotany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
JseleBurgos
 
Presentatation_SM_muscle_structpes_funtionre_ty.pptx
Presentatation_SM_muscle_structpes_funtionre_ty.pptxPresentatation_SM_muscle_structpes_funtionre_ty.pptx
Presentatation_SM_muscle_structpes_funtionre_ty.pptx
muralinath2
 
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
muralinath2
 
VERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptx
VERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptxVERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptx
VERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptx
hipachi8
 
APES 6.5 Presentation Fossil Fuels .pdf
APES 6.5 Presentation Fossil Fuels   .pdfAPES 6.5 Presentation Fossil Fuels   .pdf
APES 6.5 Presentation Fossil Fuels .pdf
patelereftu
 
Metallurgical process class 11_Govinda Pathak
Metallurgical process class 11_Govinda PathakMetallurgical process class 11_Govinda Pathak
Metallurgical process class 11_Govinda Pathak
GovindaPathak6
 
4. Chapter 4 - FINAL Promoting Inclusive Culture (2).pdf
4. Chapter 4 - FINAL Promoting Inclusive Culture (2).pdf4. Chapter 4 - FINAL Promoting Inclusive Culture (2).pdf
4. Chapter 4 - FINAL Promoting Inclusive Culture (2).pdf
abayamargaug
 
UNIT chromatography instrumental6 .pptx
UNIT chromatography  instrumental6 .pptxUNIT chromatography  instrumental6 .pptx
UNIT chromatography instrumental6 .pptx
myselfit143
 
Gel Electrophorosis, A Practical Lecture.pptx
Gel Electrophorosis, A Practical Lecture.pptxGel Electrophorosis, A Practical Lecture.pptx
Gel Electrophorosis, A Practical Lecture.pptx
Dr Showkat Ahmad Wani
 
Skin function_protective_absorptive_Presentatation.pptx
Skin function_protective_absorptive_Presentatation.pptxSkin function_protective_absorptive_Presentatation.pptx
Skin function_protective_absorptive_Presentatation.pptx
muralinath2
 
Effect of nutrition in Entomophagous Insectson
Effect of nutrition in Entomophagous InsectsonEffect of nutrition in Entomophagous Insectson
Effect of nutrition in Entomophagous Insectson
JabaskumarKshetri
 
Chapter 4_Part 2_Infection and Immunity.ppt
Chapter 4_Part 2_Infection and Immunity.pptChapter 4_Part 2_Infection and Immunity.ppt
Chapter 4_Part 2_Infection and Immunity.ppt
JessaBalanggoyPagula
 
2025 Insilicogen Company English Brochure
2025 Insilicogen Company English Brochure2025 Insilicogen Company English Brochure
2025 Insilicogen Company English Brochure
Insilico Gen
 
Concise Notes on tree and graph data structure
Concise Notes on tree and graph data structureConcise Notes on tree and graph data structure
Concise Notes on tree and graph data structure
YekoyeTigabu2
 
Zoonosis, Types, Causes. A comprehensive pptx
Zoonosis, Types, Causes. A comprehensive pptxZoonosis, Types, Causes. A comprehensive pptx
Zoonosis, Types, Causes. A comprehensive pptx
Dr Showkat Ahmad Wani
 
amino compounds.pptx class 12_Govinda Pathak
amino compounds.pptx class 12_Govinda Pathakamino compounds.pptx class 12_Govinda Pathak
amino compounds.pptx class 12_Govinda Pathak
GovindaPathak6
 
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Ali Raei
 
06-Molecular basis of transformation.pptx
06-Molecular basis of transformation.pptx06-Molecular basis of transformation.pptx
06-Molecular basis of transformation.pptx
LanaQadumii
 
Skin_Glands_Structure_Secretion _Control
Skin_Glands_Structure_Secretion _ControlSkin_Glands_Structure_Secretion _Control
Skin_Glands_Structure_Secretion _Control
muralinath2
 
Polymerase Chain Reaction (PCR).Poer Pint
Polymerase Chain Reaction (PCR).Poer PintPolymerase Chain Reaction (PCR).Poer Pint
Polymerase Chain Reaction (PCR).Poer Pint
Dr Showkat Ahmad Wani
 
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdfBotany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
JseleBurgos
 
Presentatation_SM_muscle_structpes_funtionre_ty.pptx
Presentatation_SM_muscle_structpes_funtionre_ty.pptxPresentatation_SM_muscle_structpes_funtionre_ty.pptx
Presentatation_SM_muscle_structpes_funtionre_ty.pptx
muralinath2
 
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
muralinath2
 
VERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptx
VERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptxVERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptx
VERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptx
hipachi8
 
APES 6.5 Presentation Fossil Fuels .pdf
APES 6.5 Presentation Fossil Fuels   .pdfAPES 6.5 Presentation Fossil Fuels   .pdf
APES 6.5 Presentation Fossil Fuels .pdf
patelereftu
 
Metallurgical process class 11_Govinda Pathak
Metallurgical process class 11_Govinda PathakMetallurgical process class 11_Govinda Pathak
Metallurgical process class 11_Govinda Pathak
GovindaPathak6
 
4. Chapter 4 - FINAL Promoting Inclusive Culture (2).pdf
4. Chapter 4 - FINAL Promoting Inclusive Culture (2).pdf4. Chapter 4 - FINAL Promoting Inclusive Culture (2).pdf
4. Chapter 4 - FINAL Promoting Inclusive Culture (2).pdf
abayamargaug
 
UNIT chromatography instrumental6 .pptx
UNIT chromatography  instrumental6 .pptxUNIT chromatography  instrumental6 .pptx
UNIT chromatography instrumental6 .pptx
myselfit143
 
Gel Electrophorosis, A Practical Lecture.pptx
Gel Electrophorosis, A Practical Lecture.pptxGel Electrophorosis, A Practical Lecture.pptx
Gel Electrophorosis, A Practical Lecture.pptx
Dr Showkat Ahmad Wani
 
Skin function_protective_absorptive_Presentatation.pptx
Skin function_protective_absorptive_Presentatation.pptxSkin function_protective_absorptive_Presentatation.pptx
Skin function_protective_absorptive_Presentatation.pptx
muralinath2
 
Effect of nutrition in Entomophagous Insectson
Effect of nutrition in Entomophagous InsectsonEffect of nutrition in Entomophagous Insectson
Effect of nutrition in Entomophagous Insectson
JabaskumarKshetri
 
Chapter 4_Part 2_Infection and Immunity.ppt
Chapter 4_Part 2_Infection and Immunity.pptChapter 4_Part 2_Infection and Immunity.ppt
Chapter 4_Part 2_Infection and Immunity.ppt
JessaBalanggoyPagula
 
2025 Insilicogen Company English Brochure
2025 Insilicogen Company English Brochure2025 Insilicogen Company English Brochure
2025 Insilicogen Company English Brochure
Insilico Gen
 
Concise Notes on tree and graph data structure
Concise Notes on tree and graph data structureConcise Notes on tree and graph data structure
Concise Notes on tree and graph data structure
YekoyeTigabu2
 
Zoonosis, Types, Causes. A comprehensive pptx
Zoonosis, Types, Causes. A comprehensive pptxZoonosis, Types, Causes. A comprehensive pptx
Zoonosis, Types, Causes. A comprehensive pptx
Dr Showkat Ahmad Wani
 
amino compounds.pptx class 12_Govinda Pathak
amino compounds.pptx class 12_Govinda Pathakamino compounds.pptx class 12_Govinda Pathak
amino compounds.pptx class 12_Govinda Pathak
GovindaPathak6
 
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Turkey Diseases and Disorders Volume 2 Infectious and Nutritional Diseases, D...
Ali Raei
 
06-Molecular basis of transformation.pptx
06-Molecular basis of transformation.pptx06-Molecular basis of transformation.pptx
06-Molecular basis of transformation.pptx
LanaQadumii
 
Ad

Intro to datawarehouse dev 1.0

  • 2. Data Warehousing • What is a data warehouse? • A multi-dimensional data model • Data warehouse architecture • Data warehouse Implementation
  • 3. Data Warehousing Data warehousing is a process, not a product, for assembling and managing data from various sources for the purpose of gaining a single detailed view of part or all of a business. This single view is the data warehouse. On-line Analytical Processing (OLAP) is a technique used for providing management decision support using historical and summarized data that is consolidated in the data warehouse.
  • 4. Data Warehousing Most database systems continue to grow but a data warehouse grows at a slower rate. User updates to a data warehouse are usually forbidden, updates must come from the underlying databases to maintain consistency.
  • 5. Data Warehousing To speed up OLAP Queries, a warehouse contains summarized and consolidated information representing materialized aggregate views of the enterprise data from a number of databases. A warehouse stores data while OLAP derives strategic information from it. Data warehouse may be used to provide an enterprise memory which operational data does not provide.
  • 6. Data Warehousing Warehouse usually contains information over time helping analysis of trends A data warehouse is repackaging information to support business decision making The aim in data warehousing may be to generate new revenue by selling the repackaged information
  • 7. A definition A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision making process. (W.H. Inmon)
  • 8. Subject-oriented • Organized around major subjects such as students, degree, country • Focusing on the modeling and analysis of data for decision makers, not on daily operations • Providing a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process
  • 9. Integrated • May be constructed by integrating multiple data sources e.g. multiple databases • Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources
  • 10. Time Variant • Long time horizon for data warehouse, significantly longer than that of operational system • Operational database : current value data • Data warehouse data : provide information from historical perspective • Every key structure in the data warehouse contains an element of time, explicitly or implicitly • Operational data may or may not contain time element
  • 11. Non-volatile • A physically separate store of data transformed from the operational environment • No update of data • Does not require transaction processing, recovery and concurrency control mechanisms • Requires only two operations in data accessing : initial loading of data and access of data
  • 12. Data Warehouse Process • Define the architecture, do capacity planning, select hardware and software • Design the warehouse schema and the views • Design the physical data structures • Design data extraction, cleaning, transformation, load and refresh software • Populate the repository with data and software • Design and implement end-user application
  • 13. Star Schema •A star schema consists of one central fact table and several denormalized dimension tables. •The measures of interest for OLAP are stored in the fact table (e.g. Dollar Amount, Units in the table SALES). • For each dimension of the multidimensional model there exists a dimension table (e.g. Geography, Product, Time, Account) with all the levels of aggregation and the extra properties of these levels.
  • 14. Star Schema SALES Geography Code Time Code Account Code Product Code Dollar Amount Units Geography Geography Code Region Code Region Manager State Code City Code ..... Product Product Code Product Name Brand Code Brand Name Prod. Line Code Prod. Line Name Time Time Code Quarter Code Quarter Name Month Code Month Name Date Account Account Code KeyAccount Code KeyAccountName Account Name Account Type Account Market
  • 15. Snowflake Schema • The normalized version of the star schema • Explicit treatment of dimension hierarchies (each level has its own table) • Easier to maintain, slower in query answering
  • 16. Snowflake Schema SALES Postal Code Time Code Account Code Product Code Dollar Amount Units Time Time Code Quarter Code Month Code Quarter Quarter Code QuarterName Month Month Code Month Name Account Account Code KeyAccount Code Account attributes Account Code AccountName KeyAccount KeyAcc Code KeyAcc Name Geography Postal Code Region Code State Code City Code Region Region Code Region Mgr State State Code State Name City City Code City Name Product Product Code Prod Line Code Brand Code Product Product Code ProductName Brand Brand Code Brand Name ProdLine ProdLineCode ProdLineName
  • 17. Data Warehouse Design The E-R Model approach which consists of entities and relationships is not suitable for designing a schema for a warehouse. Four steps dimensional design process • Business Process • Grain • Dimensions • Facts
  • 18. Business Process • First step is to decide what business process(es) to model • A process is a natural business activity performed in the organization • Business Processes include raw materials purchasing, orders, shipments, invoicing, inventory and general ledgers
  • 19. Grain • Granularity means what level of data detail should be made available in the dimensional model • Preferably develop model for the most atomic information captured by a business process • Atomic data is the most detailed information collected, such data cannot be subdivided further
  • 20. Dimensions • Dimension tables are the entry points into the fact tables • Robust attributes delivers robust analytic slicing and dicing capabilities •A carefully grain statement determines the primary dimensionality of the fact table • More dimensions can be added at later stage if doesn't affect granularity • Levels can be defined in dimensions • Dimensions i.e. date, product, store etc • Surrogate keys are used to join dimensions and fact tables
  • 21. Facts • A row in a fact table corresponds to a measurement • All the measurements in a fact table must be at the same grain •Fact contains two type of attributes • aggregate numeric values i.e. sales quantity, sales amount etc • Many-to-many relationships between dimensions • Fact contains history data
  • 22. Examples • Retail Sales • Inventory • Procurement
  • 23. Retail Sales Date Dimension Date key (PK) Date Day of week Calendar week ending date Calendar Month Calendar year-month Calendar quarter … and more Store Dimension Store key (PK) Store Name Store Number Store District Store Region … and more Product Dimension Product key (PK) Product Description SKU Number Brand Description Subcategory Description Department Description Package Type … and more Promotion Dimension Promotion key (PK) Promotion Name Promotion Media Type Promotion Begin Date Promotion End Date … and more Retail Sales Transaction Fact Date key (FK) Product key (FK) Store Key (FK) Promotion key (FK) Transaction number (DD) Sales quantity Sales dollar amount Cost dollar amount Gross profit dollar amount
  • 24. Inventory Date Dimension Warehouse Dimension Warehouse key (PK) Warehouse Name Warehouse Address Warehouse City Warehouse State Warehouse Zip Warehouse Zone … and more Product Dimension Inventory Transaction Type Dimension Inventory Transaction Type key (PK) Inventory Transaction Type Description Inventory Transaction Type Group Inventory Transaction Fact Date key (FK) Product key (FK) Warehouse Key (FK) Vendor key (FK) Inventory Transaction Type key (FK) Inventory Transaction Dollar Amount Vendor Dimension
  • 25. Procurement (Single Transaction) Date Dimension Vendor Dimension Vendor key (PK) Vendor Name Vendor Street Address Vendor City Vendor Zip Vendor State/Province Vendor Country Vendor Status … and more Product Dimension Procurement Transaction Type Dimension Procurement Transaction key (PK) Procurement Transaction Description Procurement Transaction Category Procurement Transaction Fact Procurement Transaction Date key (FK) Product key (FK) Vendor Key (FK) Contract Terms key (FK) Procurement Transaction Type key (FK) Contract Number (DD) Procurement Transaction Quantity Procurement Transaction Dollar Amount Vendor Dimension
  • 26. Procurement (Multiple Transaction) Date Dimension Vendor Dimension Product Dimension Received Condition Dimension Purchase Requisition Fact Requisition Date key (FK) Requested Key (FK) Product key (FK) Vendor Key (FK) Contract Terms key (FK) Requested By Key (FK) Contract Number (DD) Purchase Requisition Number (DD) Purchase Requisition Quantity Purchase Requisition Amount Contract Terms DimensionEmployee Dimension Discount Taken Dimension Purchase Order Fact Requisition Date key (FK) Requested Date Key (FK) Purchase Order Date Key (FK) Product key (FK) Vendor Key (FK) Contract Terms key (FK) Requested By Key (FK) Purchased Agent Key (FK) Contract Number (DD) Purchase Requisition Number (DD) Purchase Order Number (DD) Purchase Order Quantity Purchase Order Amount Shipping Notice Fact Warehouse Receipts Fact Vendor Payment Fact
  • 27. Data Warehouse Architecture •Generic Two-Level Architecture •Independent Data Mart •Dependent Data Mart and Operational Data Store •Logical Data Mart and @ctive Warehouse
  • 28. Generic Two Level Architecture
  • 30. Dependent Data Mart and Operational Data Store
  • 31. Logical Data Mart and @ctive Warehouse
  • 32. Data Warehouses Vs Data Marts Data Mart Department Single-subject Few < 100 GB Months Data Mart Data Warehouse Property Scope Subjects Data Source Size (typical) Implementation time Data Warehouse Enterprise Multiple Many 100 GB to > 1 TB Months to years
  • 33. The ETL Process • Capture • Scrub or data cleansing • Transform • Load and Index
  • 34. Static extractStatic extract = capturing a snapshot of the source data at a point in time Incremental extractIncremental extract = capturing changes that have occurred since the last static extract Capture = extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Extraction
  • 35. Scrub = cleanse…uses pattern recognition and AI techniques to upgrade data quality Fixing errors:Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Also:Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing Cleansing
  • 36. Transform = convert data from format of operational system to format of data warehouse Record-level:Record-level: Selection – data partitioning Joining – data combining Aggregation – data summarization Field-level:Field-level: single-field – from one field to one field multi-field – from many fields to one, or one field to many Transformation
  • 37. In general – some transformation function translates data from old form to new form Algorithmic transformation uses a formula or logical expression Table lookup – another approach Single-Field Transformation
  • 38. M:1 –from many source fields to one target field 1:M –from one source field to many target fields Multi-Field Transformation
  • 39. Load/Index= place transformed data into the warehouse and create indexes Refresh mode:Refresh mode: bulk rewriting of target data at periodic intervals Update mode:Update mode: only changes in source data are written to data warehouse Loading and Indexing
  • 40.  The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques  Relational OLAP (ROLAP) – Traditional relational representation  Multidimensional OLAP (MOLAP) – CubeCube structure  OLAP Operations – Cube slicing – come up with 2-D view of data – Drill-down – going from summary to more detailed views On-Line Analytical Processing (OLAP)
  • 42. Summary report Drill-down with color added Drill Down Example
  • 44. B-Tree Indexes B-tree indexes – branch blocks or upper level blocks point to the corresponding lower-level blocks – leaf blocks contain the Oracle ROWID that points at the location of the actual row the leaf refers to Why is it so popular in Oracle products? – simplicity – easy to maintain – high retrieval speed of highly selective column values (high cardinality) – the size of the table has little or no impact on the speed with which B-tree indexed data can be fetched
  • 45. Where does it work best? select ... where colA = ‘ABC’; select ... where colA between ‘A12’ and ‘R45’
  • 46. Bitmap index  for columns with very few unique values (low cardinality)  built for one column at a time  stream of bits: each bit relates to a column value in a single row of table
  • 47. Bitmap index create bitmap index person_region on person (region); Row Region North bitmap East bitmap West bitmap South bitmap 1 North 1 0 0 0 2 East 0 1 0 0 3 West 0 0 1 0 4 West 0 0 1 0 5 South 0 0 0 1 6 North 1 0 0 0 When to use it  tables that have no or little insert/update are good candidates (static data in warehouse)  columns that have low cardinality are good candidates (if the cardinality of a column is <= 0.1 % that the column is ideal candidate, consider also 0.2% – 1%)
  • 48. Bitmap vs. the B-tree Unique col. Values Card. (%) B-tree space Bitmap space 500,000 50.00 15.29 12.35 100,000 10.00 15.21 5.25 10,000 1.00 14.34 2.99 100 0.01 13.40 1.38 5 <0.01 13.40 0.78
  • 49. Partitions The space allocated to tables can be divided several parts called partitions – In this case, we mean horizontal partitioning, or dividing up the table by row, not columns – Examples of partitioning strategies include  by week / month / year  by the first few letters in a name  by a active / inactive  by commonly-used WHERE criteria
  • 50. Partition Example - By Range We could create this customer table with a command looking something like: CREATE TABLE customer ( name VARCHAR2(30) …) PARTITION BY RANGE (name) (PARTITION p1 VALUES LESS THAN ('I'), PARTITION p2 VALUES LESS THAN ('S'), PARTITION p3 VALUES LESS THAN (MAXVALUE)) ;
  • 51. Partition Example - By Range  When we use the BY RANGE method for partitioning tables, we only specify the maximum value of each column. – The minimum value for each column is calculated by Oracle  Note the use of MAXVALUE, which tells Oracle that all other customers go into partition p3.
  • 52. Partition Example - By Range We can assign partitions to different tablespaces: … PARTITION p1 VALUES LESS THAN 'I' TABLESPACE tbsp1, … This allows us to segregate the table across more than one disk drive, for example
  • 53. Partition Example - By Hash We could also have partitioned the customer table by using hashing CREATE TABLE customer ( … ) PARTITION BY HASH (name) PARTITIONS 3 STORE IN (p1, p2, p3) ;
  • 54. Partition Example - By Hash  In this case, we have told Oracle to create three partitions named p1, p2, and p3  Oracle will use the customers name to hash into one of these three partitions.  One disadvantage to this approach is that it might be difficult to cluster customers logically for searching / scanning
  • 55. Partition Example - By List Finally, we can specify a list of values for each partition: CREATE TABLE customer ( … ) PARTITION BY LIST (state) (PARTITION p1 values ('AZ','CA'), PARTITION p2 values ('WA', 'MO', 'OR', 'NV'), PARTITION p3 values ('') ;
  • 56. Modifying Partitions  The ALTER TABLE command can be used to manage individual partitions: – adding, modifying, or dropping – exchanging – moving – renaming – splitting or truncating
  • 57. Querying Partitions  The SELECT command can be used to query specific partitions SELECT … FROM customer PARTITION (p1) … ;  The CBO is also capable of determining whether a query should be made against a subset of partitions in a table