Partitioning
Partitioning
Partitioning strategy
6. INTRODUCTION
In practE, f we ere to parition a fact lable into time segments, we ahould not
expeci cach segmeni Lo be the sarne size as all the others. This is because the number
of transactions wittin the business al a given point in the year may not be the same
as the number of transactions at a diuferent point in the year.
For eaampie. high street retailers would expect much higher transaction
vahumes at peak periods. such as Christmas and Easter, compared with the rest of
A
could require
the year. This implies that a sales fact table that is partitioned monthly
a number of partitions that are four to
five times as large as the others.
consider the various
In order to address this possible discrepancy, we need to
befare deciding on the optimum
ways in which fact data could be partitioned, will
solution. We must remember that the determination of horizontal partitioning
warehouse.
also have to consider the requirements for manageability of the data
table is
This is the standard form of partitioning discussed earlier. The fact
partitioned on a time period basis, where each time period represents significant
a
order of 500.
Table partitions can be reused, by removing all the data in them. However, we
have to take into account that a number of the partitions will store transactions over
a busy period in the business, and that the rest may be substantially smaller.
As will be discussed in section 6.2.6, database tables that represent fact table
partitions are reused in a round robinthe warehouse manager. This means that
by
we hav to create a number of tables that are sized to contain the expected number of
transactions for the period that they represent.
Month 1 Month 1
Month 2||Month 2
Sales
700
Month 3 |Month 3
million
records Quarter Quarter
Time-based partitioning is the safest basis for partitioning fact tables, because the
grouping of calendar periods is highly unlikely to change within the life of the data
warehouse. For example, a month's worth of data is not going to represent more
than 31 days' worth of data.
This does not mean that fact tables cannot be partitioned on a different basis.
There may be good reasons for partitioning by product group, region, supplier, or
any other dimensiot.
For example, lt us consider a marketing function that is structured into distinct
regional departments: for example, on a state-by-state basis. If each region tends to
query on information taptured within its region, it is probably more effective to
partition the fact table into regional partitions. This guarantees that all the queries
for that region are speeded up by not having to scan information that is not relevant.
Clearly, the benefit of this style of partitioning is that it speeds up all queries
regarding a region, regardless of the time period it covers. This technique is
particularly appropriate where there is no definable active period within the
organization.
When using form of. dimensional
a
In some data warchouses, therc may not be a clcar busis for partitioning the fact
table on any dimension. In these instances, you should consider partitioning the lac
table purely on a size basis: that is, when the table is about to cxcced a predetcrmincu
size, a new table partition is created.
If we consider a customer event data warchouse in the retail banking area, we could
find that the business operates a 7 x 24 operation: that is, there is no operational concept
of the end of the business day, because transactions can occur at any point in time. It may
be inappropriate to split customer transactions on a daily/weck.y/monthly basis.
If no other dimension is appropriate for partitioning, we may well have to
partition by size óf table. What this means is that, as transactians are loaded into the
data warehouse, we create a new table partition when a predetermined size is
reached. This partitioning scheme is complex to manage, and requires metadata
identify what data is stored in each partition.
In vertical partitioning, as the name suggests, data is split vertically. This process is
shown in Figure 6.2.
This process can take two forms: normalization and row
ization is a standard relational
splitting. Normal-
method of database organization. It allows common
fields to be collapsed into single rows,
Tables 6.2 and 6.3 show a normalization
thereby reducing space usage. For example,
process. In the data warehouse arena the
approach tends to be the other way. Large tables are.often denormalized, even
Dept, Deptno,..
Sales, 20, ..
Training, 52,...
Grade, Title,
5, Senior,.
16 London SE
4.25 21-FEB-96 Cheapo
4 12.00 21-JUN-96 16 Cheapo London SE
24 1.05 21-JUN-96 64 Tatty York N
17 4 2.47 22-JUN-96 16 Cheapo London SE
128 3.99 21-JUN-96 16 Cheapo London SE
Figure 6.3
- Normalization
that there will be no requirements to perform major join operations between the twwo
partitions. This sort of partitioning can be useful, for example, in situations where
the split-out data is accessed only by drill-down
operations.
Different hardware-pårtitioning
will use a mix of the
techniques used to address each
are
area; solutions
This section
required techniques in order to
provide the best overall result.
assumes that the reader is familiar with the
specifically, hardware architectures for SMP,
MPP,
contents of Chapter 11:
and NUMA machines. clustered SMP, MPP hybrid,