Parallel Databases
Parallel Databases
Introduction
Parallel machines are becoming quite common and affordable
Prices of microprocessors, memory and disks have dropped
sharply
Recent desktop computers feature multiple processors and this
trend is projected to accelerate
Databases are growing increasingly large
large volumes of transaction data are collected and stored for later
analysis.
multimedia objects like images are increasingly stored in
databases
Large-scale parallel database systems increasingly used for:
storing large volumes of data
processing time-consuming decision-support queries
providing high throughput for transaction processing
Parallelism in Databases
Data can be partitioned across multiple disks for parallel I/O.
Individual relational operations (e.g., sort, join, aggregation) can be
executed in parallel
Queries are expressed in high level language (SQL, translated to
relational algebra)
makes parallelization easier.
Different queries can be run in parallel with each other.
Concurrency control takes care of conflicts.
Partitioning
Types of partitioning
Pipelined parallelism
Consider a join of four relations
r1 r2 r3 r4
Set up a pipeline that computes the three joins in parallel
Let P1 be assigned the computation of
temp1 = r1 r2
And P2 be assigned the computation of temp2 = temp1
r3
And P3 be assigned the computation of temp2 r4
Each of these operations can execute in parallel, sending result
tuples it computes to the next operation even as it is computing
further results
Independent Parallelism
Independent parallelism
Consider a join of four relations
r1 r2 r3 r4
Let P1 be assigned the computation of
temp1 = r1 r2
And P2 be assigned the computation of temp2 = r 3 r4
And P3 be assigned the computation of temp1 temp 2
P1 and P2 can work independently in parallel
P3 has to wait for input from P1 and P2
Can pipeline output of P1 and P2 to P3, combining
independent parallelism and pipelined parallelism
Does not provide a high degree of parallelism
useful with a lower degree of parallelism.
less useful in a highly parallel system.
Design of Parallel Systems