0% found this document useful (0 votes)

66 views8 pages

Partitioning

Uploaded by

angrymosa17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views8 pages

Partitioning

Uploaded by

angrymosa17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

CHAPTER 6

Partitioning strategy

6. INTRODUCTION

The objective of this chapter is to eaplain how to determine the appropriate

database-partitioning strateggy Partitioning is performed for a number of
performance-related and managrabihty reasons, and the strategy as a whole must
balance all the vanous requirements.
This chapter assumes that you have read Part Two: Data Warehouse
Architecture, and Chapter 5 on database design. The design guidelines within this
chapter elaborate on the maternial covered previously.

6.2 HORIZONTAL PARTITIONING

In section 4 54 we discussod how, in many organizations, not all theinformation in

the faot tabie may be actuvely used at à single point in time. We also explained how
horieontaBy paruioning the fact labie was a good way to speed up queries, by
n m i z n g tho scl of data to be scannad (without usimg an index)rt 5 1

In practE, f we ere to parition a fact lable into time segments, we ahould not
expeci cach segmeni Lo be the sarne size as all the others. This is because the number
of transactions wittin the business al a given point in the year may not be the same
as the number of transactions at a diuferent point in the year.
For eaampie. high street retailers would expect much higher transaction
vahumes at peak periods. such as Christmas and Easter, compared with the rest of

A
could require
the year. This implies that a sales fact table that is partitioned monthly
a number of partitions that are four to
five times as large as the others.
consider the various
In order to address this possible discrepancy, we need to
befare deciding on the optimum
ways in which fact data could be partitioned, will
solution. We must remember that the determination of horizontal partitioning
warehouse.
also have to consider the requirements for manageability of the data

6.2.. PARTITIONING BY TIME INTO EQUAL SEGMENTS

table is
This is the standard form of partitioning discussed earlier. The fact
partitioned on a time period basis, where each time period represents significant
a

retention period within the business.

For example, if the. majority of the user queries are likely to be querying on

month-to-date values, it is probably appropriate to partition into monthly segments.

If the query period is fortnight-to-date, consider partitioning into fortnightly
segments as long as the total number of tables does not exceed something in thne

order of 500.
Table partitions can be reused, by removing all the data in them. However, we

have to take into account that a number of the partitions will store transactions over
a busy period in the business, and that the rest may be substantially smaller.
As will be discussed in section 6.2.6, database tables that represent fact table

partitions are reused in a round robinthe warehouse manager. This means that
by
we hav to create a number of tables that are sized to contain the expected number of
transactions for the period that they represent.

6.2.2 PARTITIONING BY TIME INTO DIFFERENT-SIZED SEGMENTS

In situations where aged data is accessed infrequently, it may be appropriate to

partition the fact table into different-sized segments. Typically, this would be
implemented as a set of small partitions for relatively current data, larger partitions
for less active data, and even larger partitions for.inactive data.
For example, in a shrínkage analysis data warehouse where the active analysis
period is month-to-date, we could consider (working backwards from the current
date):
three monthly partitions for the last three months (including current month),
one quarterly partition for the previous quarter,
one half-yearly partition for the remainder of the year.
These partitions would be created for each year (or part thereof) of history retention
(Figure 6.1).
The advantages of using this technique are that all the detailed information
remains available online, without having to resort to using aggregations. Also, the
number of physical tables is kept relatively small, reducing operating costs.
Year 1 Year 2

Month 1 Month 1
Month 2||Month 2
Sales
700
Month 3 |Month 3
million
records Quarter Quarter

Half year Half year

Figure 6.1 Partitioning fact tables into different-sized segments.

This technique may be particularly appropriate in environments that require a

mix of data dipping recent history, and data mining through the entire history set.
The disadvantage of using this technique is that the partitioning profile will
change on a regular basis. Using this method, a partitioning strategy that
differentiates on a monthly basis implies that data must be physically repartitioned
at the start of the new month, or at the very least at the start of each
quarter.
In effect, you end up moving large
portions of the database on a regular
basis, and this degree of repartitioning GUIDEEINE 6.1Consider the
will increase the operational costs of the ise of dirferent sizeg
data warehouse, so before you consider partiiou
adopting this technique check that the
ere.the búsinessrequires a mix
ofdata dipping recent.history.and
increase is offset by the overall perfor
ata mining aged history
mance improvements.

6.2.3 PARTITIONING ON A DIFFERENT DIMENSION

Time-based partitioning is the safest basis for partitioning fact tables, because the
grouping of calendar periods is highly unlikely to change within the life of the data
warehouse. For example, a month's worth of data is not going to represent more
than 31 days' worth of data.
This does not mean that fact tables cannot be partitioned on a different basis.
There may be good reasons for partitioning by product group, region, supplier, or
any other dimensiot.
For example, lt us consider a marketing function that is structured into distinct
regional departments: for example, on a state-by-state basis. If each region tends to
query on information taptured within its region, it is probably more effective to
partition the fact table into regional partitions. This guarantees that all the queries
for that region are speeded up by not having to scan information that is not relevant.
Clearly, the benefit of this style of partitioning is that it speeds up all queries
regarding a region, regardless of the time period it covers. This technique is
particularly appropriate where there is no definable active period within the
organization.
When using form of. dimensional
a

partitioning, it is essential that we first

determine that the basis for
partitioning is GUIDELINE 6.2 Consider
unlikely to change in the future. Itis partitioning on a dimension other
important to avoid asituation where the than time, if the user functions lend
entire
entire fact table has to be restructured to themselvesto that
reflect
change in the grouping
a of the
partitioning dimension.
To follow on our previous
example, if the definition of what constitutes a region
within the business changes at any
time, the entire fact table would have to be
repartitioned to rèpresent the new regional groupings.
In order to avoid this substantial
cost, we recommend that you
the time dimension, unless you are certain that the partition only on
will not change within the life of the data suggested dimensional grouping
warehouse. This guideline applies equally
well to the time dimension, which is
you should not partition on a time
why GUIDELUNE 6.3 Do not
partition on a dimensional grouping
grouping that change in the future.
may chat is likely to
Examples of groupings to avoid are listed fe of
change within the
in Table 6.1.
the data wareho

Table 6.1 Examples of partitioning bases to avoid.

Dimension
Comment
Product
Apt tochange product groupings over time.
Customer partitioning at a high level, e.g. sub-businessConsider
unit
Location Always changes on an
ongoing basis
Avoid if
organization is about to restructure
Time-financial geographically
e.g.
promotional year,
or
first quarter FY97, Could change over time
sumimer
sales 96, etc.
6.2.4 PARTITIONING BY SIZE OF TABLE

In some data warchouses, therc may not be a clcar busis for partitioning the fact
table on any dimension. In these instances, you should consider partitioning the lac
table purely on a size basis: that is, when the table is about to cxcced a predetcrmincu
size, a new table partition is created.
If we consider a customer event data warchouse in the retail banking area, we could
find that the business operates a 7 x 24 operation: that is, there is no operational concept
of the end of the business day, because transactions can occur at any point in time. It may
be inappropriate to split customer transactions on a daily/weck.y/monthly basis.
If no other dimension is appropriate for partitioning, we may well have to
partition by size óf table. What this means is that, as transactians are loaded into the
data warehouse, we create a new table partition when a predetermined size is
reached. This partitioning scheme is complex to manage, and requires metadata
identify what data is stored in each partition.

6.2.5 PARTITIONING DIMENSIONS

In some cases, the dimension may contain such

a large number of entries that it
may
eed to be partitioned in the same way as a fact table. Put another
check the size of the dimension over the lifetime of the data
way, we need to
warehouse.
For example, let us consider a large
dimension that varies over time. If the
requirement exists to store all the variations in order to apply comparisons, that
dimension may become quite large. A large dimension table
can substantially affect
query response times.
The basis for partitioning a dimension table is
reflect a partitioning basis that fits the business
unlikely to be time. In order to
profile, we suggest that you consider
partitioning on a grouping of the dimension being partitioned.
Forexample, if the product dimension contains a large product catalog, of half a
million to a million records, which vary say
table into product groupings.
substantially over time, consider
partitioning the
Specifically, select the level.in the product hierarchy that
contains the number of instances that
approximates
desire. If there are 50 departments, create a
to the number of
partitións you
In practice, situations where
partition for eachdepårtment.
unusual. If the dimension table appears
partitioning dimension tables is
appropriate are
to be
not contain embedded facts that large, always
very check
that it does
are causing unnecessary rows to be added.

6.2.6 USING ROUND-ROBIN PARTITIONS

Once the data warehouse is holding the full
a new historical
partition is required, the oldest partition willcomplement of information, as
be archived. It is
archive the oldest partition prior to
creating a new.
possible to
for the latest data. one, and reuse the old partition
correct table
Metadata is usedin order to allow user access tools to refer to the
Such as
partition. The warehouse manager creates a meaningful table name,
the data
sales month_to_date o r sales_last_week, which represents
content of a physical partition.
This technique also makes it simpler
to automate many of the table manage
ment facilities within the data warehouse,
by allowing the system to refer to the same GUIDELINE 64 Structure
physical table partitions. The information horizontal partitions to round
period they cover will change, but this robin. Remember that thhe size o
can be managed by using each par ticion may vary
appropriate
metadata.

6.3 VERTICAL PARTITIONING

In vertical partitioning, as the name suggests, data is split vertically. This process is
shown in Figure 6.2.
This process can take two forms: normalization and row
ization is a standard relational
splitting. Normal-
method of database organization. It allows common
fields to be collapsed into single rows,
Tables 6.2 and 6.3 show a normalization
thereby reducing space usage. For example,
process. In the data warehouse arena the
approach tends to be the other way. Large tables are.often denormalized, even

| Name, Empno, ... Dept, Deptno, ... Grade, Title, .

Smith, 1027, .. Sales, 20,.. 5,Senior,
Jones, 429, Training,52,.. 5, Senior, ..
Murphy, 1136,.. Sales, 20,.. 5, Senior,..
|Name, Empno,
Smith, 1027, .
Jones, 429, ...
Murphy, 1136, .

Dept, Deptno,..
Sales, 20, ..
Training, 52,...
Grade, Title,
5, Senior,.

Figure 6.2 Vertical partitioning.

Table 6.2 Tables before normaization.

Product_id Quantity Value Sales_ date Store_id Store_name Location Region

16 London SE
4.25 21-FEB-96 Cheapo
4 12.00 21-JUN-96 16 Cheapo London SE
24 1.05 21-JUN-96 64 Tatty York N
17 4 2.47 22-JUN-96 16 Cheapo London SE
128 3.99 21-JUN-96 16 Cheapo London SE

Table 6.3 Tables after normalization.

Store id Store name Location Region

16 Cheapo London SE
64 Tatty York N

Product_id Quantity Value Sales_date Store_id

27 4.25 21-FEB-96 16
32 4 12.00 21-JUN-96 16
24 1.05 21-JUN-96 64
17 4 2.47 22-JUN-96 16
128 6 3.99 21-JUN-96 16

though this can lead to a lot of extra space

being used, to avoid the overheads of GUIDELINE 65 Normalizing
joining during queries.
the data This is
data in a data warehouse cantead to
particularly of the fact data.
true large,inefficient joOin operations
Vertical partitioning can sometimes Such operations should be avoided.
be used in a data warehouse to split less-
used column information out from a ..
frequently accessed fact table. We distinguish row splitting from the normalization
process, because it is performed for a diferent purpose. Row splitting also tends to
leave a one-to-one map between the partitions, whereas normalization will leave a
"

one-to-many mapping (Figure 6.3)

The aim of row splitting is to speed
access to the large table by reduçing its
GUIDELINE 6.6 Consider row
size. The other data is still maintained and
splitting a sacttable if some
can be accessed separately. Before using a
columns are accessed.tntreguent
vertical partition you need to be very sure
Row splitting

Figure 6.3
- Normalization

Normalization versus row splitting.

that there will be no requirements to perform major join operations between the twwo
partitions. This sort of partitioning can be useful, for example, in situations where
the split-out data is accessed only by drill-down
operations.

6.4 HARDWARE PARTITIONING

As discussed in section 4.5.4, part of the

design
process is to determine how to
maximize the hardware performance by designing the database to fit specific
hardware architectures. The precise mechanism used varies
hardware platform, but in e_sence, must address the depending on the specific
following areas:
maximizing the
processing power available;
maximizing disk and
I/O performance;
avoiding bottlenecking on a single CPU;
avoiding bottlenecking on I/0 hroughput.

Different hardware-pårtitioning
will use a mix of the
techniques used to address each
are
area; solutions
This section
required techniques in order to
provide the best overall result.
assumes that the reader is familiar with the
specifically, hardware architectures for SMP,
MPP,
contents of Chapter 11:
and NUMA machines. clustered SMP, MPP hybrid,

SQL Server Partitioning
100% (2)
SQL Server Partitioning
20 pages
Farmers E Market A Project Report Submit
No ratings yet
Farmers E Market A Project Report Submit
65 pages
Partitioning Method
No ratings yet
Partitioning Method
8 pages
Partitioning Strategy
No ratings yet
Partitioning Strategy
17 pages
Data Mining Questions
No ratings yet
Data Mining Questions
9 pages
UNIT 4_NOTES-1
No ratings yet
UNIT 4_NOTES-1
17 pages
Oracle 11g Partitioning
No ratings yet
Oracle 11g Partitioning
11 pages
A Comprehensive Guide To Oracle Partitioning With Samples
No ratings yet
A Comprehensive Guide To Oracle Partitioning With Samples
36 pages
Partitioning PDF
No ratings yet
Partitioning PDF
5 pages
Microsoft - Strategies For Partitioning Relational Data Warehouses in SQL Server
No ratings yet
Microsoft - Strategies For Partitioning Relational Data Warehouses in SQL Server
27 pages
Database Partitioning A Review Paper
No ratings yet
Database Partitioning A Review Paper
4 pages
SAP BI 7.0 - InfoCube Partitioning
No ratings yet
SAP BI 7.0 - InfoCube Partitioning
5 pages
Partitions: Creating A Range-Partitioned Table
No ratings yet
Partitions: Creating A Range-Partitioned Table
3 pages
Table Partitioning in SQL Server
No ratings yet
Table Partitioning in SQL Server
11 pages
Partitioning for Database Performance
No ratings yet
Partitioning for Database Performance
3 pages
Partitioning in Oracle
No ratings yet
Partitioning in Oracle
5 pages
Partitioning Fundamentals
No ratings yet
Partitioning Fundamentals
1 page
KU85 Smart Date Keys Partition Fact Tables
No ratings yet
KU85 Smart Date Keys Partition Fact Tables
2 pages
Oracle Performance Tuning - Oracle Partitioning - Introduction
No ratings yet
Oracle Performance Tuning - Oracle Partitioning - Introduction
57 pages
Partitioning - DW
No ratings yet
Partitioning - DW
14 pages
NYOUG08 Part
No ratings yet
NYOUG08 Part
10 pages
DW - Chap 6
No ratings yet
DW - Chap 6
2 pages
Partitions (Analysis Services - Multidimensional Data) : SQL Server 2012
No ratings yet
Partitions (Analysis Services - Multidimensional Data) : SQL Server 2012
4 pages
Making A Staging Table
No ratings yet
Making A Staging Table
2 pages
3 RD Unit Partioning
No ratings yet
3 RD Unit Partioning
3 pages
The Data Warehouse ETL Toolkit - Chapter 06
No ratings yet
The Data Warehouse ETL Toolkit - Chapter 06
77 pages
Table Partitioning:: Secret Weapon For Big Data Problems
No ratings yet
Table Partitioning:: Secret Weapon For Big Data Problems
46 pages
Advanced Fact Table Techniques
No ratings yet
Advanced Fact Table Techniques
7 pages
Modeling Fact Tables in Warehouse - Microsoft Fabric _ Microsoft Learn
No ratings yet
Modeling Fact Tables in Warehouse - Microsoft Fabric _ Microsoft Learn
7 pages
u4 - 5 i o Parallelism
No ratings yet
u4 - 5 i o Parallelism
8 pages
Oracle Partitioning in Oracle Database 11g
No ratings yet
Oracle Partitioning in Oracle Database 11g
47 pages
CDA-C2-r-074-en-file-68.en
No ratings yet
CDA-C2-r-074-en-file-68.en
3 pages
Parallel Databases
No ratings yet
Parallel Databases
19 pages
Oracle Partitioning
No ratings yet
Oracle Partitioning
6 pages
Interval Partitioning in Oracle 11g
No ratings yet
Interval Partitioning in Oracle 11g
5 pages
Populating A DW With SS2K
No ratings yet
Populating A DW With SS2K
5 pages
Oracle Optimization Tutorial - Partitioning
No ratings yet
Oracle Optimization Tutorial - Partitioning
5 pages
Partitioned Tables and Indexes: Introduction To Partitioning
No ratings yet
Partitioned Tables and Indexes: Introduction To Partitioning
18 pages
Craige
No ratings yet
Craige
4 pages
Oracle Partitioning For Developers
No ratings yet
Oracle Partitioning For Developers
70 pages
11g Partitioning Features Part2
No ratings yet
11g Partitioning Features Part2
4 pages
Oracle Partitions by Fayyaz Ahmed
No ratings yet
Oracle Partitions by Fayyaz Ahmed
7 pages
Partitioning in Oracle 9i
100% (8)
Partitioning in Oracle 9i
19 pages
Snowflake or Flatten in A DM
No ratings yet
Snowflake or Flatten in A DM
3 pages
OOW08 Part
No ratings yet
OOW08 Part
47 pages
Basics of Partitioning
100% (1)
Basics of Partitioning
2 pages
Partitioning in Oracle 1728042170
No ratings yet
Partitioning in Oracle 1728042170
12 pages
Method for developing and partitioning graph-based data warehouses using association rules
No ratings yet
Method for developing and partitioning graph-based data warehouses using association rules
12 pages
Partitioning: See Partitioning Infocubes Using The Characteristic 0fiscper
No ratings yet
Partitioning: See Partitioning Infocubes Using The Characteristic 0fiscper
3 pages
20762C 03
No ratings yet
20762C 03
29 pages
data_partition_survey
No ratings yet
data_partition_survey
23 pages
18 Partitioned Tables and Indexes: Introduction To Partitioning
No ratings yet
18 Partitioned Tables and Indexes: Introduction To Partitioning
84 pages
SAP HANA Database - Partitioning and Distribution of Large Tables PDF
No ratings yet
SAP HANA Database - Partitioning and Distribution of Large Tables PDF
14 pages
Data Partitioning
No ratings yet
Data Partitioning
5 pages
Data Partitioning Methods
No ratings yet
Data Partitioning Methods
9 pages
Things You Always Wanted To Know About Oracle Partitioning
No ratings yet
Things You Always Wanted To Know About Oracle Partitioning
43 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
46 pages
Bigtable Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
Bigtable Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
From Everand
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The InfluxDB Handbook: Deploying, Optimizing, and Scaling Time Series Data
From Everand
The InfluxDB Handbook: Deploying, Optimizing, and Scaling Time Series Data
Robert Johnson
No ratings yet
The Data Warehouse Advantage
From Everand
The Data Warehouse Advantage
Pasquale De Marco
No ratings yet
2 Lecture Notes Chapter8 CAATTS Data Extraction Analysis
No ratings yet
2 Lecture Notes Chapter8 CAATTS Data Extraction Analysis
34 pages
Mysql Material
No ratings yet
Mysql Material
185 pages
DBMS Multiple Choice Questions and Answers-Normalization66
No ratings yet
DBMS Multiple Choice Questions and Answers-Normalization66
4 pages
A. 2NF C. 4NF D. 5NF
No ratings yet
A. 2NF C. 4NF D. 5NF
21 pages
Com 312 (Database Design)
No ratings yet
Com 312 (Database Design)
6 pages
Blood Donation
No ratings yet
Blood Donation
20 pages
Select From Where: 1. Name - Instructor Name Course Id Instructor Teaches Instructor ID Teaches ID
No ratings yet
Select From Where: 1. Name - Instructor Name Course Id Instructor Teaches Instructor ID Teaches ID
3 pages
Z17520020220174028New PPT Ch10
No ratings yet
Z17520020220174028New PPT Ch10
30 pages
1 To 3 NF
No ratings yet
1 To 3 NF
21 pages
Roles of Data Scientists in Business and Society
No ratings yet
Roles of Data Scientists in Business and Society
47 pages
DBMS MCQ
No ratings yet
DBMS MCQ
12 pages
Database - Q and A
No ratings yet
Database - Q and A
9 pages
Normalization Slide
No ratings yet
Normalization Slide
28 pages
4th Sem Syllabus
No ratings yet
4th Sem Syllabus
39 pages
Ay 2020 2021
No ratings yet
Ay 2020 2021
244 pages
R.K.D.F. University, Bhopal: First Year
No ratings yet
R.K.D.F. University, Bhopal: First Year
43 pages
Takehome IS341 Database System FinalExam Odd 2021 2022
No ratings yet
Takehome IS341 Database System FinalExam Odd 2021 2022
3 pages
Watcharun
No ratings yet
Watcharun
50 pages
Database Management Systems Question Paper
No ratings yet
Database Management Systems Question Paper
9 pages
Lecture 9 - Database Normalization PDF
No ratings yet
Lecture 9 - Database Normalization PDF
52 pages
Functional Dep Biruk Tsegaye 75721
No ratings yet
Functional Dep Biruk Tsegaye 75721
12 pages
Normalization Example
No ratings yet
Normalization Example
14 pages
CH 4
No ratings yet
CH 4
14 pages
The BCA (1) 23
No ratings yet
The BCA (1) 23
36 pages
Database Normalization Explain 1NF 2NF 3NF BCNF With Examples PDF
0% (1)
Database Normalization Explain 1NF 2NF 3NF BCNF With Examples PDF
9 pages
DBMS Tutorial PDF
No ratings yet
DBMS Tutorial PDF
66 pages
Exercise 4
No ratings yet
Exercise 4
2 pages
Theory Assignment 2-1
No ratings yet
Theory Assignment 2-1
4 pages
DBMS PPT!
No ratings yet
DBMS PPT!
20 pages

Partitioning

Uploaded by

Partitioning

Uploaded by

CHAPTER 6

The objective of this chapter is to eaplain how to determine the appropriate

6.2 HORIZONTAL PARTITIONING

In section 4 54 we discussod how, in many organizations, not all theinformation in

6.2.. PARTITIONING BY TIME INTO EQUAL SEGMENTS

retention period within the business.

month-to-date values, it is probably appropriate to partition into monthly segments.

6.2.2 PARTITIONING BY TIME INTO DIFFERENT-SIZED SEGMENTS

In situations where aged data is accessed infrequently, it may be appropriate to

Half year Half year

Figure 6.1 Partitioning fact tables into different-sized segments.

This technique may be particularly appropriate in environments that require a

6.2.3 PARTITIONING ON A DIFFERENT DIMENSION

partitioning, it is essential that we first

Table 6.1 Examples of partitioning bases to avoid.

6.2.5 PARTITIONING DIMENSIONS

In some cases, the dimension may contain such

6.2.6 USING ROUND-ROBIN PARTITIONS

6.3 VERTICAL PARTITIONING

| Name, Empno, ... Dept, Deptno, ... Grade, Title, .

Figure 6.2 Vertical partitioning.

Product_id Quantity Value Sales_ date Store_id Store_name Location Region

Table 6.3 Tables after normalization.

Store id Store name Location Region

Product_id Quantity Value Sales_date Store_id

though this can lead to a lot of extra space

one-to-many mapping (Figure 6.3)

Normalization versus row splitting.

6.4 HARDWARE PARTITIONING

As discussed in section 4.5.4, part of the

You might also like