Cs8492 Dbms Unit 4
Cs8492 Dbms Unit 4
JAYABHARATHI,AP/IT,APEC
RAID level 0 divides data into block units and writes them across a number of disks. As data is placed
across multiple disks it is also called “data Striping”.
The advantage of distributing data over disks is that if different I/O requests are pending for two different
blocks of data, then there is a possibility that the requested blocks are on different disks
There is no parity checking of data. So if data in one drive gets corrupted then all the data would be lost. Thus RAID
0 does not support data recovery Spanning is another term that is used with RAID level 0 because the logical disk
will span all the physical drives. RAID 0 implementation requires minimum 2 disks.
Advantages
I/O performance is greatly improved by spreading the I/O load across many channels & drives.
Best performance is achieved when data is striped across multiple controllers with only one driver per
controller
Disadvantages
It is not fault-tolerant, failure of one drive will result in all data in an array being lost
RAID Level 1: Mirroring (or shadowing)
Also known as disk mirroring, this configuration consists of at least two drives that duplicate the storage of
Page 1
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
RAID Level 2:
This configuration uses striping across disks, with some disks storing error checking and correcting (ECC)
information. It has no advantage over RAID 3 and is no longer used.
Page 2
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
RAID Level 5:
RAID 5 uses striping as well as parity for redundancy. It is well suited for heavy read and low
write operations.
Block-Interleaved Distributed Parity; partitions data and parity among all N + 1 disks, rather than storing
data in N disks and parity in 1 disk.
RAID Level 6:
This technique is similar to RAID 5, but includes a second parity scheme that is distributed across the drives
in the array. The use of additional parity allows the array to continue to function even if two disks fail
simultaneously. However, this extra protection comes at a cost.
P+Q Redundancy scheme; similar to Level 5, but stores extra redundant information to guard
against multiple disk failures.
- Better reliability than Level 5 at a higher cost; not used as widely.
Page 3
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
File Organization
The database is stored as a collection of files.
Each file is a sequence of records.
A record is a sequence of fields.
Classifications of records
– Fixed length record
– Variable length record
Fixed length record approach:
Assume record size is fixed each file has records of one particular type only different files are used
for different relations
Simple approach
- Record access is simple
Example pseudo code
type account = record
account_number char(10);
branch_name char(22);
balance numeric(8);
end
Total bytes 40 for a record
Two problems
- Difficult to delete record from this structure.
- Some record will cross block boundaries, that is part of the record will be stored in one block and
part in another. It would require two block accesses to read or write
Reuse the free space alternatives:
– move records i + 1, . . ., n to n i, . . . , n – 1
– do not move records, but link all free records on
a free list
– Move the final record to deleted record place.
Free Lists
Store the address of the first deleted record in the file header.
Use this first record to store the address of the second deleted record, and so on
Page 4
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
Variable-Length Records
Byte string representation
Attach an end-of-record () control character to the end of each record
Difficulty with deletion
0 perryridge A-102 400 A-201 900
Disadvantage
It is not easy to reuse space occupied formerly by deleted record.
There is no space in general for records grows longer
Slotted Page Structure
Pointer Method
A variable-length record is represented by a list of fixed-length records, chained together via pointers.
Can be used even if the maximum record length is not known.
Disadvantage to pointer structure; space is wasted in all records except the first in a a chain.
Solution is to allow two kinds of block in file:
Anchor block – contains the first records of chain
Overflow block – contains records other than those that are the first records of chains.
Page 5
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
• Suitable for applications that require sequential processing of the entire file
• The records in the file are ordered by a search-key
Search-key pointer
Index files are typically much smaller than the original file
Two basic kinds of indices:
– Ordered indices: search keys are stored in sorted order
– Hash indices: search keys are distributed uniformly across “buckets” and by using a “hash
function” the values are determined.
In an ordered index, index entries are stored sorted on the search key value.
Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of
the file.
Secondary index: an index whose search key specifies an order different from the sequential order of
the file.
Dense index
Sparse index
Dense Index Files
Page 6
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
Dense index — Index record appears for every search-key value in the file.
Multilevel Index
If primary index does not fit in memory, access becomes expensive.
To reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file
and construct a sparse index on it.
– outer index – a sparse index of primary index
– inner index – the primary index file
If even outer index is too large to fit in main memory, yet another level of index can be created, and so on.
Page 7
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
Page 8
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
Disadvantage of indexed-sequential files: performance degrades as file grows, since many overflow blocks
get created. Periodic reorganization of entire file is required.
Advantage of B+-tree index files: automatically reorganizes itself with small, local, changes, in the face
of insertions and deletions. Reorganization of entire file is not required to maintain performance.
Disadvantage of B+-trees: extra insertion and deletion overhead, space overhead.
Page 9
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
Non-leaf nodes other than root must have between 3 and 5 children ((n/2 and n with n =5).
Root must have at least 2 children.
Observations about B+-trees
Since the inter-node connections are done by pointers, “logically” close blocks need not be
“physically” close.
The B+-tree contains a relatively small number of levels thus searches can be conducted efficiently.
Insertions and deletions to the main file can be handled efficiently.
Updates on B+-Trees: Insertion
Find the leaf node in which the search-key value would appear
If the search-key value is already there in the leaf node, record is added to file and if necessary a pointer
is inserted into the bucket.
If the search-key value is not there, then add the record to the main file and create a bucket
if necessary.Then:
– If there is room in the leaf node, insert (key-value, pointer) pair in the leaf node otherwise, split the
node.
+
Example: B -Tree before and after insertion of “Clearview”
Page 10
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
• The removal of the leaf node containing “Downtown” did not result in its parent having too little pointers.
So the cascaded deletions stopped with the deleted leaf node’s parent.
Deletion of “Perryridge” from result of previous example
• Node with “Perryridge” becomes empty and merged with its sibling.
• Root node then had only one child, and was deleted and its child became the new root node
B+-Tree File Organization
• The leaf nodes in a B+-tree file organization store records, instead of pointers.
• Since records are larger than pointers, the maximum number of records that can be stored in a leaf node is
less than the number of pointers in a nonleaf node.
• Leaf nodes are still required to be half full.
• Insertion and deletion are handled in the same way as insertion and deletion of entries in a B+-tree index.
Page 11
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
HASHING
• Hashing is an effective technique to calculate the direct location of a data record on the disk without
using index structure.
• Hashing uses hash functions with search keys as parameters to generate the address of a data record.
Hash Organization
Bucket
A hash file stores data in bucket format. Bucket is considered a unit of storage. A bucket typically
stores one complete disk block, which in turn can store one or more records.
Hash Function
A hash function, h, is a mapping function that maps all the set of search-keys K to the address
where actual records are placed. It is a function from search keys to bucket addresses.
Worst hash function maps all search-key values to the same bucket.
Page 12
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
An ideal hash function is uniform, i.e., each bucket is assigned the same number of search-key values from
the set of all possible values.
Ideal hash function is random, so each bucket will have the same number of
records. Types
• Static Hashing
• Dynamic Hashing
Static Hashing
In static hashing, when a search-key value is provided, the hash function always computes the same address.
For example, if mod-4 hash function is used, then it shall generate only 5 values. The output address shall
always be same for that function.
The number of buckets provided remains unchanged at all times.
Operation
Insertion − When a record is required to be entered using static hash, the hash function h computes the
bucket address for search key K, where the record will be stored.
Bucket address = h(K)
Search − When a record needs to be retrieved, the same hash function can be used to retrieve the address
of the bucket where the data is stored.
Delete − This is simply a search followed by a deletion operation.
Page 13
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
Hash Indices
• Hashing can be used not only for file organization, but also for index-structure creation.
• A hash index organizes the search keys, with their associated record pointers, into a hash file structure.
• Hash indices are always secondary indices
Page 14
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
– The number of buckets also changes dynamically due to coalescing and splitting of buckets.
General Extendable Hash
In this structure, i2 = i3 = i, whereas i1 = i – 1
Else
– increment i and double the size of the bucket address table.
– replace each entry in the table by two entries that point to the same bucket.
– recompute new bucket address table entry for Kj
Now i > ij so use the first case above.
Deletion in Extendable Hash Structure
To delete a key value,
– locate it in its bucket and remove it.
– The bucket itself can be removed if it becomes empty (with appropriate updates to the bucket
address table).
– Coalescing of buckets can be done (can coalesce only with a “buddy” bucket having same value of ij
and same ij –1 prefix, if it is present)
– Decreasing bucket address table size is also possible
• Note: decreasing bucket address table size is an expensive operation and should be
done only if number of buckets becomes much smaller than the size of the table
Example
Page 15
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
Page 16
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
appropriate bucket
Updates in Extendable Hash Structure
To insert a record with search-key value Kj
– follow same procedure as look-up and locate the bucket, say j.
– If there is room in the bucket j insert record in the bucket.
– Overflow buckets used instead in some cases.
To delete a key value,
– locate it in its bucket and remove it.
– The bucket itself can be removed if it becomes empty
– Coalescing of buckets can be done
– Decreasing bucket address table size is also possible
Benefits of extendable hashing:
– Hash performance does not degrade with growth of file
– Minimal space overhead
Disadvantages of extendable hashing
– Extra level of indirection to find desired
record Bucket address table may itself become very big.
SQL query is first translated into an equivalent extended relational algebra expression.
SQL queries are decomposed into query blocks, which form the basic units that can be translated into
the algebraic operators and optimized.
Query block contains a single SELECT-FROM-WHERE expression, as well as GROUP BY and
HAVING clauses.
Nested queries within a query are identified as separate query blocks.
Example:
Page 17
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
These algorithms depend on the file having specific access paths and may apply only to certain types
of selection conditions.
We will use the following examples of SELECT operations:
– (OP1):σSSN=‘123456789’ (EMPLOYEE)
– (OP2):σ DNUMBER > 5 (DEPARTMENT)
Page 18
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
– (OP3):σDNO=5 (EMPLOYEE)
– (OP4):σ DNO=5 AND SALARY>30000 AND SEX = ‘F’ (EMPLOYEE)
– (OP5):σESSN=‘123456789’ AND PNO=10 (WORKS_ON)
Many search methods can be used for simple selection: S1 through S6
S1: Linear Search (brute force) –full scan in Oracle’s terminology-
– Retrieves every record in the file, and test whether its attribute values satisfy the selection condition: an
expensive approach.
– Cost: b/2 if key and b if no key
S2: Binary Search
– If the selection condition involves an equality comparison on a key attribute on which the file is ordered.
Page 19
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
– Assumption: The smaller file fits entirely into memory buckets after the first phase.
– (If the above assumption is not satisfied, the method is a more complex one and number of variations
have been proposed to improve efficiency: partition has join and hybrid hash join.)
Probing Phase
– A single pass through the other file (S) then hashes each of its records to probe appropriate bucket,
and that record is combined with all matching records from R in that bucket.
Heuristic-Based Query Optimization
1. Break up SELECT operations with conjunctive conditions into a cascade of SELECT operations
2. Using the commutativity of SELECT with other operations, move each SELECT operation as far down the
query tree as is permitted by the attributes involved in the select condition
Page 20
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
3. Using commutativity and associativity of binary operations, rearrange the leaf nodes of the tree
4. Combine a CARTESIAN PRODUCT operation with a subsequent SELECT operation in the tree into a
JOIN operation, if the condition represents a join condition
5. Using the cascading of PROJECT and the commuting of PROJECT with other operations, break down and
move lists of projection attributes down the tree as far as possible by creating new PROJECT operations as
needed
6. Identify sub-trees that represent groups of operations that can be executed by a single algorithm
Query
"Find the last names of employees born after 1957 who work on a project named ‘Aquarius’."
SQL
SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME=‘Aquarius’ AND PNUMBER=PNO AND ESSN=SSN AND BDATE.‘1957-12-31’;
Page 21
DBMS NOTES PREPARED BY MRS.S.JAYABHARATHI,AP/IT,APEC
– [nLevelA(I) + 1] + [nBlocks(R)/2]
Cost functions for JOIN Operation
Join operation is the most time consuming operation to process.
An estimate for the size (number of tuples) of the file that results after the JOIN operation is required
to develop reasonably accurate cost functions for JOIN operations.
The JOIN operations define the relation containing tuples that satisfy a specific predicate F from the
Cartesian product of two relations R and S.
Page 22