Unit Iv Implementation Techniques
Unit Iv Implementation Techniques
IMPLEMENTATION
TECHNIQUES
RAID – File Organization – Organization of Records in Files – Indexing and Hashing –Ordered Indices – B+
tree Index Files – B tree Index Files – Static Hashing – Dynamic Hashing – Query Processing Overview –
Algorithms for SELECT and JOIN operations – Query optimization using Heuristics and Cost Estimation.
INDEXING
Ordered Index Or Primary Indexing
Example:
• suppose a company contains several employees in each
department. Suppose we use a clustering index, where all
employees which belong to the same Dept_ID are considered
within a single cluster, and index pointers point to the cluster as a
whole. Here Dept_Id is a non-unique key.
B-Tree
• B-Tree is known as a self-balancing tree as its nodes are
sorted in the inorder traversal.
• In B-tree, a node can have more than two children.
• B-tree has a height of logM N (Where ‘M’ is the order of
tree and N is the number of nodes).
• And the height is adjusted automatically at each update.
• In the B-tree data is sorted in a specific order, with the
lowest value on the left and the highest value on the right.
• To insert the data or key in B-tree is more complicated
than a binary tree.
B-Tree
B-Tree
• There are some conditions that must be hold
by the B-Tree:
– All the leaf nodes of the B-tree must be at
the same level.
– Above the leaf nodes of the B-tree, there
should be no empty sub-trees.
– B- tree’s height should lie as low as possible.
B+ Tree
• B+ tree eliminates the drawback B-tree used for indexing
by storing data pointers only at the leaf nodes of the tree.
• Thus, the structure of leaf nodes of a B+ tree is quite
different from the structure of internal nodes of the B
tree.
• It may be noted here that, since data pointers are present
only at the leaf nodes, the leaf nodes must necessarily
store all the key values along with their corresponding
data pointers to the disk file block, in order to access
them.
B+ Tree
• Moreover, the leaf nodes are linked to
providing ordered access to the records.
• The leaf nodes, therefore form the first level
of the index, with the internal nodes forming
the other levels of a multilevel index.
• Some of the key values of the leaf nodes also
appear in the internal nodes, to simply act as a
medium to control the searching of a record
HASHING
• For a huge database structure, it can be almost
impossible to search all the index values through all
its level and then reach the destination data block to
retrieve the desired data.
• Hashing is an effective technique to calculate the
direct location of a data record on the disk without
using index structure.
• Hashing uses hash functions with search keys as
parameters to generate the address of a data record.
HASHING
Hash Organization
• Bucket − A hash file stores data in bucket format.
Bucket is considered a unit of storage. A bucket
typically stores one complete disk block, which in
turn can store one or more records.
• Hash Function − A hash function, h, is a mapping
function that maps all the set of search-keys K to
the address where actual records are placed. It is
a function from search keys to bucket addresses
TYPES OF HASHING METHODS
• Two types of hashing methods are
– 1) static hashing
– 2) dynamic hashing
• In the static hashing, the resultant data bucket
address will always remain the same.
• Dynamic hashing offers a mechanism in which
data buckets are added and removed
dynamically and on demand.
Static Hashing
• In static hashing, when a search-key value is
provided, the hash function always computes the
same address.
• For example, if mod-4 hash function is used, then
it shall generate only 4 values.
• The output address shall always be same for that
function.
• The number of buckets provided remains
unchanged at all times.
Bucket Overflow
• The condition of bucket-overflow is known
as collision. This is a fatal state for any static hash
function. In this case, overflow chaining can be used.
• Overflow Chaining − When buckets are full, a new
bucket is allocated for the same hash result and is
linked after the previous one.
• This mechanism is called Closed Hashing.
ADDRESS=H=K(MOD 5)
What is Collision?
• Hash collision is a state when the resultant hashes from two
or more data in the data set, wrongly map the same place in
the hash table.
How to deal with Hashing Collision?
• There are two technique which you can use to avoid a hash
collision:
– Rehashing: This method, invokes a secondary hash function, which
is applied continuously until an empty slot is found, where a record
should be placed.
– Chaining: Chaining method builds a Linked list of items whose key
hashes to the same value. This method requires an extra link field to
each table position.
Dynamic Hashing
• The problem with static hashing is that it does not
expand or shrink dynamically as the size of the
database grows or shrinks.
• Dynamic hashing provides a mechanism in which data
buckets are added and removed dynamically and on-
demand.
• Dynamic hashing is also known as extended hashing.
• Hash function, in dynamic hashing, is made to produce
a large number of values and only a few are used
initially.
QUERY PROCESSING
STEPS IN QUERY PROCESSING
TRANSLATING SQL QUERIES TO RELATIONAL
ALZEBRA
TRANSLATING SQL QUERIES TO RELATIONAL
ALZEBRA
TRANSLATING SQL QUERIES TO RELATIONAL
ALZEBRA
ALGORITHMS FOR EXTERNAL SORTING
ALGORITHM FOR SELECT OPERATION
ALGORITHM FOR SELECT OPERATION
ALGORITHM FOR SELECT OPERATION
ALGORITHM FOR SELECT OPERATION
ALGORITHM FOR SELECT OPERATION
ALGORITHM FOR JOIN OPERATION
ALGORITHM FOR JOIN OPERATION
ALGORITHM FOR JOIN OPERATION
ALGORITHM FOR JOIN OPERATION
THANK YOU.