0% found this document useful (0 votes)
38 views

Parallel Databases

Parallel databases are increasingly common as the cost of hardware has decreased. Large databases require parallelism for storage, queries, and throughput. There are different types of parallelism including interquery, intraquery, interoperation, and intraoperation parallelism. Data can be partitioned horizontally or vertically across multiple disks for parallel input/output and queries can utilize various parallelization techniques. Issues in parallel database design include parallel data loading, resilience to failures, and redundancy.

Uploaded by

Madara Uchiha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Parallel Databases

Parallel databases are increasingly common as the cost of hardware has decreased. Large databases require parallelism for storage, queries, and throughput. There are different types of parallelism including interquery, intraquery, interoperation, and intraoperation parallelism. Data can be partitioned horizontally or vertically across multiple disks for parallel input/output and queries can utilize various parallelization techniques. Issues in parallel database design include parallel data loading, resilience to failures, and redundancy.

Uploaded by

Madara Uchiha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 11

Parallel Databases

Introduction
 Parallel machines are becoming quite common and affordable
 Prices of microprocessors, memory and disks have dropped
sharply
 Recent desktop computers feature multiple processors and this
trend is projected to accelerate
 Databases are growing increasingly large
 large volumes of transaction data are collected and stored for later
analysis.
 multimedia objects like images are increasingly stored in
databases
 Large-scale parallel database systems increasingly used for:
 storing large volumes of data
 processing time-consuming decision-support queries
 providing high throughput for transaction processing
Parallelism in Databases
 Data can be partitioned across multiple disks for parallel I/O.
 Individual relational operations (e.g., sort, join, aggregation) can be
executed in parallel
 Queries are expressed in high level language (SQL, translated to
relational algebra)
 makes parallelization easier.
 Different queries can be run in parallel with each other.
Concurrency control takes care of conflicts.
Partitioning

 Types of partitioning

Horizontal partitioning – tuples of a relation are divided among many


disks such that each tuple resides on one disk.

Vertical partitioning-Schema of relation is divided among many disks


such that data fields of each tuple are split and stored on various
multiple disks.
Partitioning
 Partitioning techniques (number of disks = n):
Round-robin:
Send the I th tuple inserted in the relation to disk i mod n.
Hash partitioning:
 Choose one or more attributes as the partitioning attributes.
 Choose hash function h with range 0…n - 1
 Let i denote result of hash function h applied to the partitioning
attribute value of a tuple. Send tuple to disk i.
 Range partitioning:
 Choose an attribute as the partitioning attribute.
 A partitioning vector [vo, v1, ..., vn-2] is chosen.
 Let v be the partitioning attribute value of a tuple. Tuples such that vi  vi+1 go to
disk I + 1. Tuples with v < v0 go to disk 0 and tuples with v  vn-2 go to disk n-1.
Interquery Parallelism
 Queries/transactions execute in parallel with one another.
 Increases transaction throughput; used primarily to scale up a transaction
processing system to support a larger number of transactions per second.
 Easiest form of parallelism to support, particularly in a shared-memory
parallel database, because even sequential database systems support
concurrent processing.
Intraquery Parallelism

 Execution of a single query in parallel on multiple processors/disks;


important for speeding up long-running queries.
 Two complementary forms of intraquery parallelism:
 Intraoperation Parallelism – parallelize the execution of each individual
operation in the query.
 Interoperation Parallelism – execute the different operations in a query
expression in parallel.
the first form scales better with increasing parallelism because
the number of tuples processed by each operation is typically more than the
number of operations in a query.
Interoperator Parallelism

 Pipelined parallelism
 Consider a join of four relations
 r1 r2 r3 r4
 Set up a pipeline that computes the three joins in parallel
 Let P1 be assigned the computation of
temp1 = r1 r2
 And P2 be assigned the computation of temp2 = temp1
r3
 And P3 be assigned the computation of temp2 r4
 Each of these operations can execute in parallel, sending result
tuples it computes to the next operation even as it is computing
further results
Independent Parallelism

 Independent parallelism
 Consider a join of four relations
r1 r2 r3 r4
 Let P1 be assigned the computation of
temp1 = r1 r2
 And P2 be assigned the computation of temp2 = r 3 r4
 And P3 be assigned the computation of temp1 temp 2
 P1 and P2 can work independently in parallel
 P3 has to wait for input from P1 and P2
 Can pipeline output of P1 and P2 to P3, combining
independent parallelism and pipelined parallelism
 Does not provide a high degree of parallelism
 useful with a lower degree of parallelism.
 less useful in a highly parallel system.
Design of Parallel Systems

Some issues in the design of parallel systems:


 Parallel loading of data from external sources is needed in order
to handle large volumes of incoming data.
 Resilience to failure of some processors or disks.
 Probability of some disk or processor failing is higher in a parallel
system.
 Operation (perhaps with degraded performance) should be possible
in spite of failure.
 Redundancy achieved by storing extra copy of every data item at
another processor.
End of Chapter

You might also like