0% found this document useful (0 votes)
66 views8 pages

Partitioning

Uploaded by

angrymosa17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views8 pages

Partitioning

Uploaded by

angrymosa17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

CHAPTER 6

Partitioning strategy

6. INTRODUCTION

The objective of this chapter is to eaplain how to determine the appropriate


database-partitioning strateggy Partitioning is performed for a number of
performance-related and managrabihty reasons, and the strategy as a whole must
balance all the vanous requirements.
This chapter assumes that you have read Part Two: Data Warehouse
Architecture, and Chapter 5 on database design. The design guidelines within this
chapter elaborate on the maternial covered previously.

6.2 HORIZONTAL PARTITIONING

In section 4 54 we discussod how, in many organizations, not all theinformation in


the faot tabie may be actuvely used at à single point in time. We also explained how
horieontaBy paruioning the fact labie was a good way to speed up queries, by
n m i z n g tho scl of data to be scannad (without usimg an index)rt 5 1

In practE, f we ere to parition a fact lable into time segments, we ahould not
expeci cach segmeni Lo be the sarne size as all the others. This is because the number
of transactions wittin the business al a given point in the year may not be the same
as the number of transactions at a diuferent point in the year.
For eaampie. high street retailers would expect much higher transaction
vahumes at peak periods. such as Christmas and Easter, compared with the rest of

A
could require
the year. This implies that a sales fact table that is partitioned monthly
a number of partitions that are four to
five times as large as the others.
consider the various
In order to address this possible discrepancy, we need to
befare deciding on the optimum
ways in which fact data could be partitioned, will
solution. We must remember that the determination of horizontal partitioning
warehouse.
also have to consider the requirements for manageability of the data

6.2.. PARTITIONING BY TIME INTO EQUAL SEGMENTS

table is
This is the standard form of partitioning discussed earlier. The fact
partitioned on a time period basis, where each time period represents significant
a

retention period within the business.


For example, if the. majority of the user queries are likely to be querying on

month-to-date values, it is probably appropriate to partition into monthly segments.


If the query period is fortnight-to-date, consider partitioning into fortnightly
segments as long as the total number of tables does not exceed something in thne

order of 500.
Table partitions can be reused, by removing all the data in them. However, we

have to take into account that a number of the partitions will store transactions over
a busy period in the business, and that the rest may be substantially smaller.
As will be discussed in section 6.2.6, database tables that represent fact table

partitions are reused in a round robinthe warehouse manager. This means that
by
we hav to create a number of tables that are sized to contain the expected number of
transactions for the period that they represent.

6.2.2 PARTITIONING BY TIME INTO DIFFERENT-SIZED SEGMENTS

In situations where aged data is accessed infrequently, it may be appropriate to


partition the fact table into different-sized segments. Typically, this would be
implemented as a set of small partitions for relatively current data, larger partitions
for less active data, and even larger partitions for.inactive data.
For example, in a shrínkage analysis data warehouse where the active analysis
period is month-to-date, we could consider (working backwards from the current
date):
three monthly partitions for the last three months (including current month),
one quarterly partition for the previous quarter,
one half-yearly partition for the remainder of the year.
These partitions would be created for each year (or part thereof) of history retention
(Figure 6.1).
The advantages of using this technique are that all the detailed information
remains available online, without having to resort to using aggregations. Also, the
number of physical tables is kept relatively small, reducing operating costs.
Year 1 Year 2

Month 1 Month 1
Month 2||Month 2
Sales
700
Month 3 |Month 3
million
records Quarter Quarter

Half year Half year

Figure 6.1 Partitioning fact tables into different-sized segments.

This technique may be particularly appropriate in environments that require a


mix of data dipping recent history, and data mining through the entire history set.
The disadvantage of using this technique is that the partitioning profile will
change on a regular basis. Using this method, a partitioning strategy that
differentiates on a monthly basis implies that data must be physically repartitioned
at the start of the new month, or at the very least at the start of each
quarter.
In effect, you end up moving large
portions of the database on a regular
basis, and this degree of repartitioning GUIDEEINE 6.1Consider the
will increase the operational costs of the ise of dirferent sizeg
data warehouse, so before you consider partiiou
adopting this technique check that the
ere.the búsinessrequires a mix
ofdata dipping recent.history.and
increase is offset by the overall perfor
ata mining aged history
mance improvements.

6.2.3 PARTITIONING ON A DIFFERENT DIMENSION

Time-based partitioning is the safest basis for partitioning fact tables, because the
grouping of calendar periods is highly unlikely to change within the life of the data
warehouse. For example, a month's worth of data is not going to represent more
than 31 days' worth of data.
This does not mean that fact tables cannot be partitioned on a different basis.
There may be good reasons for partitioning by product group, region, supplier, or
any other dimensiot.
For example, lt us consider a marketing function that is structured into distinct
regional departments: for example, on a state-by-state basis. If each region tends to
query on information taptured within its region, it is probably more effective to
partition the fact table into regional partitions. This guarantees that all the queries
for that region are speeded up by not having to scan information that is not relevant.
Clearly, the benefit of this style of partitioning is that it speeds up all queries
regarding a region, regardless of the time period it covers. This technique is
particularly appropriate where there is no definable active period within the
organization.
When using form of. dimensional
a

partitioning, it is essential that we first


determine that the basis for
partitioning is GUIDELINE 6.2 Consider
unlikely to change in the future. Itis partitioning on a dimension other
important to avoid asituation where the than time, if the user functions lend
entire
entire fact table has to be restructured to themselvesto that
reflect
change in the grouping
a of the
partitioning dimension.
To follow on our previous
example, if the definition of what constitutes a region
within the business changes at any
time, the entire fact table would have to be
repartitioned to rèpresent the new regional groupings.
In order to avoid this substantial
cost, we recommend that you
the time dimension, unless you are certain that the partition only on
will not change within the life of the data suggested dimensional grouping
warehouse. This guideline applies equally
well to the time dimension, which is
you should not partition on a time
why GUIDELUNE 6.3 Do not
partition on a dimensional grouping
grouping that change in the future.
may chat is likely to
Examples of groupings to avoid are listed fe of
change within the
in Table 6.1.
the data wareho

Table 6.1 Examples of partitioning bases to avoid.


Dimension
Comment
Product
Apt tochange product groupings over time.
Customer partitioning at a high level, e.g. sub-businessConsider
unit
Location Always changes on an
ongoing basis
Avoid if
organization is about to restructure
Time-financial geographically
e.g.
promotional year,
or
first quarter FY97, Could change over time
sumimer
sales 96, etc.
6.2.4 PARTITIONING BY SIZE OF TABLE

In some data warchouses, therc may not be a clcar busis for partitioning the fact
table on any dimension. In these instances, you should consider partitioning the lac
table purely on a size basis: that is, when the table is about to cxcced a predetcrmincu
size, a new table partition is created.
If we consider a customer event data warchouse in the retail banking area, we could
find that the business operates a 7 x 24 operation: that is, there is no operational concept
of the end of the business day, because transactions can occur at any point in time. It may
be inappropriate to split customer transactions on a daily/weck.y/monthly basis.
If no other dimension is appropriate for partitioning, we may well have to
partition by size óf table. What this means is that, as transactians are loaded into the
data warehouse, we create a new table partition when a predetermined size is
reached. This partitioning scheme is complex to manage, and requires metadata
identify what data is stored in each partition.

6.2.5 PARTITIONING DIMENSIONS

In some cases, the dimension may contain such


a large number of entries that it
may
eed to be partitioned in the same way as a fact table. Put another
check the size of the dimension over the lifetime of the data
way, we need to
warehouse.
For example, let us consider a large
dimension that varies over time. If the
requirement exists to store all the variations in order to apply comparisons, that
dimension may become quite large. A large dimension table
can substantially affect
query response times.
The basis for partitioning a dimension table is
reflect a partitioning basis that fits the business
unlikely to be time. In order to
profile, we suggest that you consider
partitioning on a grouping of the dimension being partitioned.
Forexample, if the product dimension contains a large product catalog, of half a
million to a million records, which vary say
table into product groupings.
substantially over time, consider
partitioning the
Specifically, select the level.in the product hierarchy that
contains the number of instances that
approximates
desire. If there are 50 departments, create a
to the number of
partitións you
In practice, situations where
partition for eachdepårtment.
unusual. If the dimension table appears
partitioning dimension tables is
appropriate are
to be
not contain embedded facts that large, always
very check
that it does
are causing unnecessary rows to be added.

6.2.6 USING ROUND-ROBIN PARTITIONS


Once the data warehouse is holding the full
a new historical
partition is required, the oldest partition willcomplement of information, as
be archived. It is
archive the oldest partition prior to
creating a new.
possible to
for the latest data. one, and reuse the old partition
correct table
Metadata is usedin order to allow user access tools to refer to the
Such as
partition. The warehouse manager creates a meaningful table name,
the data
sales month_to_date o r sales_last_week, which represents
content of a physical partition.
This technique also makes it simpler
to automate many of the table manage
ment facilities within the data warehouse,
by allowing the system to refer to the same GUIDELINE 64 Structure
physical table partitions. The information horizontal partitions to round
period they cover will change, but this robin. Remember that thhe size o
can be managed by using each par ticion may vary
appropriate
metadata.

6.3 VERTICAL PARTITIONING

In vertical partitioning, as the name suggests, data is split vertically. This process is
shown in Figure 6.2.
This process can take two forms: normalization and row
ization is a standard relational
splitting. Normal-
method of database organization. It allows common
fields to be collapsed into single rows,
Tables 6.2 and 6.3 show a normalization
thereby reducing space usage. For example,
process. In the data warehouse arena the
approach tends to be the other way. Large tables are.often denormalized, even

| Name, Empno, ... Dept, Deptno, ... Grade, Title, .


Smith, 1027, .. Sales, 20,.. 5,Senior,
Jones, 429, Training,52,.. 5, Senior, ..
Murphy, 1136,.. Sales, 20,.. 5, Senior,..
|Name, Empno,
Smith, 1027, .
Jones, 429, ...
Murphy, 1136, .

Dept, Deptno,..
Sales, 20, ..
Training, 52,...
Grade, Title,
5, Senior,.

Figure 6.2 Vertical partitioning.


Table 6.2 Tables before normaization.

Product_id Quantity Value Sales_ date Store_id Store_name Location Region

16 London SE
4.25 21-FEB-96 Cheapo
4 12.00 21-JUN-96 16 Cheapo London SE
24 1.05 21-JUN-96 64 Tatty York N
17 4 2.47 22-JUN-96 16 Cheapo London SE
128 3.99 21-JUN-96 16 Cheapo London SE

Table 6.3 Tables after normalization.

Store id Store name Location Region


16 Cheapo London SE
64 Tatty York N

Product_id Quantity Value Sales_date Store_id


27 4.25 21-FEB-96 16
32 4 12.00 21-JUN-96 16
24 1.05 21-JUN-96 64
17 4 2.47 22-JUN-96 16
128 6 3.99 21-JUN-96 16

though this can lead to a lot of extra space


being used, to avoid the overheads of GUIDELINE 65 Normalizing
joining during queries.
the data This is
data in a data warehouse cantead to
particularly of the fact data.
true large,inefficient joOin operations
Vertical partitioning can sometimes Such operations should be avoided.
be used in a data warehouse to split less-
used column information out from a ..
frequently accessed fact table. We distinguish row splitting from the normalization
process, because it is performed for a diferent purpose. Row splitting also tends to
leave a one-to-one map between the partitions, whereas normalization will leave a
"

one-to-many mapping (Figure 6.3)


The aim of row splitting is to speed
access to the large table by reduçing its
GUIDELINE 6.6 Consider row
size. The other data is still maintained and
splitting a sacttable if some
can be accessed separately. Before using a
columns are accessed.tntreguent
vertical partition you need to be very sure
Row splitting

Figure 6.3
- Normalization

Normalization versus row splitting.

that there will be no requirements to perform major join operations between the twwo
partitions. This sort of partitioning can be useful, for example, in situations where
the split-out data is accessed only by drill-down
operations.

6.4 HARDWARE PARTITIONING

As discussed in section 4.5.4, part of the


design
process is to determine how to
maximize the hardware performance by designing the database to fit specific
hardware architectures. The precise mechanism used varies
hardware platform, but in e_sence, must address the depending on the specific
following areas:
maximizing the
processing power available;
maximizing disk and
I/O performance;
avoiding bottlenecking on a single CPU;
avoiding bottlenecking on I/0 hroughput.

Different hardware-pårtitioning
will use a mix of the
techniques used to address each
are
area; solutions
This section
required techniques in order to
provide the best overall result.
assumes that the reader is familiar with the
specifically, hardware architectures for SMP,
MPP,
contents of Chapter 11:
and NUMA machines. clustered SMP, MPP hybrid,

You might also like