Mapping Datawarehouse Architecture
Mapping Datawarehouse Architecture
MULTIPROCESSOR ARCHITECTURE
The functions of data warehouse are based on the relational data base technology. The
relational data base technology is implemented in parallel manner.
There are two advantages of having parallel relational data base technology for data
warehouse:
Linear Speed up: refers the ability to increase the number of processor to reduce response
time
Linear Scale up: refers the ability to provide same performance on the same requests as the
database size increases
Types of parallelism
There are two types of parallelism:
Inter query Parallelism: In which different server threads or processes handle multiple
requests at the same time.
Intra query Parallelism: This form of parallelism decomposes the serial SQL query into
lower level operations such as scan, join, sort etc. Then these lower level operations are
executed concurrently in parallel.
Intra query parallelism can be done in either of two ways:
Horizontal parallelism: which means that the data base is partitioned across multiple disks
and parallel processing occurs within a specific task that is performed concurrently on
different processors against different set of data
Vertical parallelism: This occurs among different tasks. All query components such as scan,
join, sort etc are executed in parallel in a pipelined fashion. In other words, an output from one
task becomes an input into another task.
DATA PARTITIONING:
Intelligent partitioning assumes that DBMS knows where a specific record is located and
does not waste time searching for it across all disks. The various intelligent partitioning
include:
HASH PARTITIONING:
A hash algorithm is used to calculate the partition number based on the value of the
partitioning key for each row
Key range partitioning: Rows are placed and located in the partitions according to the value
of the partitioning key. That is all the rows with the key value from A to K are in partition 1,
L to T are in partition 2 and so on.
Schema portioning: an entire table is placed on one disk; another table is placed on different
disk etc. This is useful for small reference tables.
User defined portioning: It allows a table to be partitioned on the basis of a user defined
expression.