0% found this document useful (0 votes)

2 views

Unit 6 notes DBMS final

The document provides an overview of storage systems and indexing in database management systems (DBMS), detailing the types of storage (primary, secondary, tertiary) and file organization methods (heap, sequential, hash, clustered). It also explains indexing techniques, including ordered indices, primary indexing, clustered indexing, secondary indexing, multilevel indexing, and B-trees, emphasizing their roles in efficient data retrieval. Additionally, it covers hashing methods, including static and dynamic hashing, and their operations for managing data records in databases.

Uploaded by

gopichandrecruiter66

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Unit 6 notes DBMS final

Uploaded by

gopichandrecruiter66

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 14

UNIT -6

OVERVIEW OF STORAGES AND INDEXING

DBMS - Storage System
Databases are stored in file formats, which contain records. At physical level, the actual data is
stored in electromagnetic format on some device. These storage devices can be broadly
categorized into three types −

 Primary Storage − The memory storage that is directly accessible to the CPU comes
under this category. CPU's internal memory (registers), fast memory (cache), and main
memory (RAM) are directly accessible to the CPU, as they are all placed on the
motherboard or CPU chipset. This storage is typically very small, ultra-fast, and volatile.
Primary storage requires continuous power supply in order to maintain its state. In case
of a power failure, all its data is lost.
 Secondary Storage − Secondary storage devices are used to store data for future use or
as backup. Secondary storage includes memory devices that are not a part of the CPU
chipset or motherboard, for example, magnetic disks, optical disks (DVD, CD, etc.),
hard disks, flash drives, and magnetic tapes.
 Tertiary Storage − Tertiary storage is used to store huge volumes of data. Since such
storage devices are external to the computer system, they are the slowest in speed. These
storage devices are mostly used to take the back up of an entire system. Optical disks
and magnetic tapes are widely used as tertiary storage.
File Organization
File Organization defines how file records are mapped onto disk blocks. We have four types of
File Organization to organize file records −
Heap File Organization
When a file is created using Heap File Organization, the Operating System allocates memory
area to that file without any further accounting details. File records can be placed anywhere in
that memory area. It is the responsibility of the software to manage the records. Heap File does
not support any ordering, sequencing, or indexing on its own.

Sequential File Organization

Every file record contains a data field (attribute) to uniquely identify that record. In sequential
file organization, records are placed in the file in some sequential order based on the unique key
field or search key. Practically, it is not possible to store all the records sequentially in physical
form.

Hash File Organization

Hash File Organization uses Hash function computation on some fields of the records. The
output of the hash function determines the location of disk block where the records are to be
placed.

Clustered File Organization

Clustered file organization is not considered good for large databases. In this mechanism,
related records from one or more relations are kept in the same disk block, that is, the ordering
of records is not based on primary key or search key.
DBMS - Indexing
We know that data is stored in the form of records. Every record has a key field, which helps it
to be recognized uniquely.
Indexing is a data structure technique to efficiently retrieve records from the database
files based on some attributes on which the indexing has been done. Indexing in database
systems is similar to what we see in books.
It is a data structure technique which is used to quickly locate and access the data in a
database.
Indexes are created using a few database columns.
 The first column is the Search key that contains a copy of the primary key or candidate
key of the table. These values are stored in sorted order so that the corresponding data can
be accessed quickly.
Note: The data may or may not be stored in sorted order.
 The second column is the Data Reference or Pointer which contains a set of pointers
holding the address of the disk block where that particular key value can be found.
In general, there are two types of file organization mechanism which are followed by the
indexing methods to store the data:

Indexing Methods

Ordered indices
The indices are usually sorted to make searching faster. The indices which are sorted are known
as ordered indices.

Example: Suppose we have an employee table with thousands of record and each of which is 10
bytes long. If their IDs start with 1, 2, 3....and so on and we have to search student with ID-543.
In the case of a database with no index, we have to search the disk block from starting till it
reaches 543. The DBMS will read the record after reading 543*10=5430 bytes.
In the case of an index, we will search using indexes and the DBMS will read the record after
reading 542*2= 1084 bytes which are very less compared to the previous case.

Primary Index
If the index is created on the basis of the primary key of the table, then it is known as primary
indexing. These primary keys are unique to each record and contain 1:1 relation between the
records.
As primary keys are stored in sorted order, the performance of the searching operation is quite
efficient.
The primary index can be classified into two types: Dense index and Sparse index.

o Dense Index:
 For every search key value in the data file, there is an index record.
 This record contains the search key and also a reference to the first data
record with that search key value.

o Sparse Index:
 The index record appears only for a few items in the data file. Each item
points to a block as shown.
 To locate a record, we find the index record with the largest search key
value less than or equal to the search key value we are looking for.
 We start at that record pointed to by the index record, and proceed along
with the pointers in the file (that is, sequentially) until we find the desired
record.
Clustered Indexing

Clustering index is defined on an ordered data file. The data file is ordered on a non-key field.
In some cases, the index is created on non-primary key columns which may not be unique for
each record. In such cases, in order to identify the records faster, we will group two or more
columns together to get the unique values and create index out of them. This method is known
as the clustering index. Basically, records with similar characteristics are grouped together and
indexes are created for these groups.
For example, students studying in each semester are grouped together. i.e. 1st Semester students,
2nd semester students, 3rd semester students etc are grouped.

Clustered index sorted according to first name (Search key)

Secondary Indexing:
In secondary indexing, to reduce the size of mapping, another level of indexing is introduced. In
this method, the huge range for the columns is selected initially so that the mapping size of the
first level becomes small. Then each range is further divided into smaller ranges. The mapping of
the first level is stored in the primary memory, so that address fetch is faster. The mapping of the
second level and actual data are stored in the secondary memory (hard disk).

For example:

o If you want to find the record of roll 111 in the diagram, then it will search the highest
entry which is smaller than or equal to 111 in the first level index. It will get 100 at this
level.
o Then in the second index level, again it does max (111) <= 111 and gets 110. Now using
the address 110, it goes to the data block and starts searching each record till it gets 111.
o This is how a search is performed in this method. Inserting, updating or deleting is also
done in the same manner.

Multilevel Indexing
With the growth of the size of the database, indices also grow. As the index is stored in the
main memory, a single-level index might become too large a size to store with multiple disk
accesses. The multilevel indexing segregates the main block into various smaller blocks so
that the same can stored in a single block. The outer blocks are divided into inner blocks
which in turn are pointed to the data blocks. This can be easily stored in the main memory
with fewer overheads.

Multi-level Index helps in breaking down the index into several smaller indices in order to make
the outermost level so small that it can be saved in a single disk block, which can easily be
accommodated anywhere in the main memory.

B Tree
B Tree is a specialized m-way tree that can be widely used for disk access. A B-Tree of order m
can have at most m-1 keys and m children. One of the main reason of using B tree is its
capability to store large number of keys in a single node and large key values by keeping the
height of the tree relatively small.

A B tree of order m contains all the properties of an M way tree. In addition, it contains the
following properties.

1. Every node in a B-Tree contains at most m children.

2. Every node in a B-Tree except the root node and the leaf node contain at least m/2
children.
3. The root nodes must have at least 2 nodes.
4. All leaf nodes must be at the same level.

It is not necessary that, all the nodes contain the same number of children but, each node must
have m/2 number of nodes.

A B tree of order 4 is shown in the following image.

While performing some operations on B Tree, any property of B Tree may violate such as
number of minimum children a node can have. To maintain the properties of B Tree, the tree
may split or join.

B+ Tree
o The B+ tree is a balanced binary search tree. It follows a multi-level index format.
o In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf nodes
remain at the same height.
o In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can support
random access as well as sequential access.

Structure of B+ Tree
o In the B+ tree, every leaf node is at equal distance from the root node. The B+ tree is of
the order n where n is fixed for every B+ tree.
o It contains an internal node and leaf node.
Internal node
o An internal node of the B+ tree can contain at least n/2 record pointers except the root
node.
o At most, an internal node of the tree contains n pointers.

Leaf node
o The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key values.
o At most, a leaf node contains n record pointer and n key values.
o Every leaf node of the B+ tree contains one block pointer P to point to next leaf node.

Searching a record in B+ Tree

Suppose we have to search 55 in the below B+ tree structure. First, we will fetch for the
intermediary node which will direct to the leaf node that can contain a record for 55.

So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at the end,
we will be redirected to the third leaf node. Here DBMS will perform a sequential search to find
55.

B+ Tree Insertion

Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf node after
55. It is a balanced tree, and a leaf node of this tree is already full, so we cannot insert 60 there.

In this case, we have to split the leaf node, so that it can be inserted into tree without affecting
the fill factor, balance and order.
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will split
the leaf node of the tree in the middle so that its balance is not altered. So we can group (50, 55)
and (60, 65, 70) into 2 leaf nodes.

If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have 60
added to it, and then we can have pointers to a new leaf node.

This is how we can insert an entry when there is overflow. In a normal scenario, it is very easy to
find the node where it fits and then place it in that leaf node.

B+ Tree Deletion

Suppose we want to delete 60 from the above example. In this case, we have to remove 60 from
the intermediate node as well as from the 4th leaf node too. If we remove it from the
intermediate node, then the tree will not satisfy the rule of the B+ tree. So we need to modify it to
have a balanced tree.

After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as follows:
DBMS - Hashing
For a huge database structure, it can be almost next to impossible to search all the index values
through all its level and then reach the destination data block to retrieve the desired data.
Hashing is an effective technique to calculate the direct location of a data record on the disk
without using index structure.
Hashing uses hash functions with search keys as parameters to generate the address of a data
record.

Hash Organization
 Bucket − A hash file stores data in bucket format. Bucket is considered a unit of storage.
A bucket typically stores one complete disk block, which in turn can store one or more
records.
 Hash Function − A hash function, h, is a mapping function that maps all the set of
search-keys K to the address where actual records are placed. It is a function from
search keys to bucket addresses.

Static Hashing
In static hashing, when a search-key value is provided, the hash function always computes the
same address. For example, if mod-4 hash function is used, then it shall generate only 5 values.
The output address shall always be same for that function. The number of buckets provided
remains unchanged at all times.
Operation
 Insertion − When a record is required to be entered using static hash, the hash
function h computes the bucket address for search key K, where the record will be
stored.
Bucket address = h(K)
 Search − When a record needs to be retrieved, the same hash function can be used to
retrieve the address of the bucket where the data is stored.
 Delete − This is simply a search followed by a deletion operation.

Bucket Overflow
The condition of bucket-overflow is known as collision. This is a fatal state for any static hash
function. In this case, overflow chaining can be used.
 Overflow Chaining − When buckets are full, a new bucket is allocated for the same
hash result and is linked after the previous one. This mechanism is called Closed
Hashing.
 Linear Probing − When a hash function generates an address at which data is already
stored, the next free bucket is allocated to it. This mechanism is called Open Hashing.

Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the size of
the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets
are added and removed dynamically and on-demand. Dynamic hashing is also known
as extended hashing.
Hash function, in dynamic hashing, is made to produce a large number of values and only a few
are used initially.
Organization
The prefix of an entire hash value is taken as a hash index. Only a portion of the hash value is
used for computing bucket addresses. Every hash index has a depth value to signify how many
bits are used for computing a hash function. These bits can address 2n buckets. When all these
bits are consumed − that is, when all the buckets are full − then the depth value is increased
linearly and twice the buckets are allocated.

Singer 7146 7186
100% (1)
Singer 7146 7186
74 pages
DBMS - R18 UNIT 5 Notes
86% (7)
DBMS - R18 UNIT 5 Notes
23 pages
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
From Everand
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
1/5 (1)
Teradata Basics Exam - Sample Question Set 1 (Answers in Italic Font)
No ratings yet
Teradata Basics Exam - Sample Question Set 1 (Answers in Italic Font)
5 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
33 pages
Unit5 File Organization
No ratings yet
Unit5 File Organization
112 pages
CIT 401 Lecture Note
No ratings yet
CIT 401 Lecture Note
46 pages
Indexing
No ratings yet
Indexing
62 pages
DBMS UNIT-5
No ratings yet
DBMS UNIT-5
23 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
13 pages
Unit-6 Storage Strategies
No ratings yet
Unit-6 Storage Strategies
43 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
R22 Unit 5
No ratings yet
R22 Unit 5
23 pages
10 File Organization in DBMS
No ratings yet
10 File Organization in DBMS
15 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
Dbms Mod3
No ratings yet
Dbms Mod3
54 pages
sqlIndexes2
No ratings yet
sqlIndexes2
10 pages
Unit 3 Storage Strategies Indices B-Trees Hashing
No ratings yet
Unit 3 Storage Strategies Indices B-Trees Hashing
12 pages
Unit 4 Chapter 1 Storage and Querying
No ratings yet
Unit 4 Chapter 1 Storage and Querying
37 pages
Indexing - DBMS
No ratings yet
Indexing - DBMS
20 pages
File Organization
No ratings yet
File Organization
11 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
5 pages
Indexing
No ratings yet
Indexing
6 pages
DBMS File Organization
No ratings yet
DBMS File Organization
60 pages
Indexing Lecture Nov 2023 Summary
No ratings yet
Indexing Lecture Nov 2023 Summary
41 pages
Self Unit 2
No ratings yet
Self Unit 2
18 pages
Unit 5
No ratings yet
Unit 5
20 pages
DBMS_UNIT_5_NOTES
No ratings yet
DBMS_UNIT_5_NOTES
28 pages
Indexing
No ratings yet
Indexing
6 pages
Class 6
No ratings yet
Class 6
15 pages
File Organization
No ratings yet
File Organization
41 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
15 pages
JNTUH_Dbms_Unit5
No ratings yet
JNTUH_Dbms_Unit5
56 pages
Indexing
No ratings yet
Indexing
6 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
12 pages
S - UNIT VII Indexing in Database
No ratings yet
S - UNIT VII Indexing in Database
9 pages
DBMS-U5 Notes
No ratings yet
DBMS-U5 Notes
16 pages
DBMS - R2017 - Anna University
No ratings yet
DBMS - R2017 - Anna University
20 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
24 pages
Chapter 5. Record Storage and Primary File Organization
No ratings yet
Chapter 5. Record Storage and Primary File Organization
18 pages
CMP 312
No ratings yet
CMP 312
2 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
81 pages
Indexing Lecture Nov 2023 Detailed
No ratings yet
Indexing Lecture Nov 2023 Detailed
37 pages
DBMS Unit V
No ratings yet
DBMS Unit V
17 pages
index1 (5)
No ratings yet
index1 (5)
25 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
Unit 5
No ratings yet
Unit 5
185 pages
DBMS Unit 5 Notes
No ratings yet
DBMS Unit 5 Notes
23 pages
DBMS Unit-4
No ratings yet
DBMS Unit-4
35 pages
File Organization Methods
No ratings yet
File Organization Methods
22 pages
Chapter 11. File Organisation and Indexes
No ratings yet
Chapter 11. File Organisation and Indexes
56 pages
Dmbs New Slides Unit 2
No ratings yet
Dmbs New Slides Unit 2
28 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
4 pages
08 File Handling
No ratings yet
08 File Handling
18 pages
Basic File Operation
100% (2)
Basic File Operation
4 pages
Chapter_3_File_Organization_Indexed_methods
No ratings yet
Chapter_3_File_Organization_Indexed_methods
31 pages
Primary Indexing
No ratings yet
Primary Indexing
7 pages
737256182-Student-List-Dec-23-to-Dec-24
No ratings yet
737256182-Student-List-Dec-23-to-Dec-24
50 pages
us it recruting pdf
No ratings yet
us it recruting pdf
14 pages
Recruiting
No ratings yet
Recruiting
36 pages
Address-Proof
No ratings yet
Address-Proof
1 page
MNARESH
No ratings yet
MNARESH
2 pages
RajeshKumar_JavaFSD
No ratings yet
RajeshKumar_JavaFSD
3 pages
R Srikanth_Azure_CL_250325_021036
No ratings yet
R Srikanth_Azure_CL_250325_021036
1 page
Request_Processing_Interconnection_Data_Elements_in_USA_Staffing_(May_2024)
No ratings yet
Request_Processing_Interconnection_Data_Elements_in_USA_Staffing_(May_2024)
99 pages
Pranay Devops...
No ratings yet
Pranay Devops...
6 pages
HYD Chennai-Student-Database-2024
No ratings yet
HYD Chennai-Student-Database-2024
8 pages
sg
No ratings yet
sg
4 pages
vendor 01.
No ratings yet
vendor 01.
1 page
horisys
No ratings yet
horisys
38 pages
sri ram.resume
No ratings yet
sri ram.resume
1 page
SAI_RESUME_
No ratings yet
SAI_RESUME_
1 page
Kushal P
No ratings yet
Kushal P
3 pages
Pega Submissions
No ratings yet
Pega Submissions
29 pages
Srikanth
No ratings yet
Srikanth
3 pages
Srikanth_1
No ratings yet
Srikanth_1
3 pages
Ranadeep
No ratings yet
Ranadeep
2 pages
Charan Jone - Pega Dev_Resume (1)
No ratings yet
Charan Jone - Pega Dev_Resume (1)
8 pages
Profile
No ratings yet
Profile
1 page
Charan raj-Service Now developer
No ratings yet
Charan raj-Service Now developer
5 pages
Saravanamuthu Muthukrishnan
No ratings yet
Saravanamuthu Muthukrishnan
8 pages
Vanja Anisha 12 Monhts of Employment Agreement CUM APPOINTMENT LETTER_Progressive RPO
No ratings yet
Vanja Anisha 12 Monhts of Employment Agreement CUM APPOINTMENT LETTER_Progressive RPO
1 page
Sai_Keerthana_FullStack_Java
No ratings yet
Sai_Keerthana_FullStack_Java
3 pages
Priyanka_Full_Stack
No ratings yet
Priyanka_Full_Stack
4 pages
Resume - Pavan Pillalamarri
No ratings yet
Resume - Pavan Pillalamarri
7 pages
Dbms Important Questions For JNTU Students
50% (2)
Dbms Important Questions For JNTU Students
3 pages
Nuodb Neobank WP PDF
No ratings yet
Nuodb Neobank WP PDF
6 pages
ch07 - 5e (Knowledge Management) Lesson 7
No ratings yet
ch07 - 5e (Knowledge Management) Lesson 7
48 pages
A Road Map For Data Science. What Is Data Science - by Jared - Towards Data Science PDF
No ratings yet
A Road Map For Data Science. What Is Data Science - by Jared - Towards Data Science PDF
6 pages
Mobox Log
No ratings yet
Mobox Log
2 pages
OpenText Extended ECM For SAP Solutions 10.0.0 - Overview Guide English (ERX100000-GGD-EN)
No ratings yet
OpenText Extended ECM For SAP Solutions 10.0.0 - Overview Guide English (ERX100000-GGD-EN)
33 pages
A PROJECT REPORT On Online Examination System
No ratings yet
A PROJECT REPORT On Online Examination System
53 pages
Cloud Security
No ratings yet
Cloud Security
6 pages
Implementasi Framework Cobit 2019 Terhadap Tata Kelola Teknologi Informasi Pada Balai Penelitian Sungei Putih
No ratings yet
Implementasi Framework Cobit 2019 Terhadap Tata Kelola Teknologi Informasi Pada Balai Penelitian Sungei Putih
11 pages
Uploading Instructions:: Assignment No. 02 Semester: Spring 2020 Total Marks: 20
No ratings yet
Uploading Instructions:: Assignment No. 02 Semester: Spring 2020 Total Marks: 20
7 pages
CP Project Proposal
No ratings yet
CP Project Proposal
3 pages
DBMS Assignment
No ratings yet
DBMS Assignment
3 pages
Gla Resume Format Edit Able
No ratings yet
Gla Resume Format Edit Able
1 page
Splunk Print
No ratings yet
Splunk Print
3 pages
Jace Data Recovery
No ratings yet
Jace Data Recovery
27 pages
Management Information Systems Testbank Answers and Sloution Laudon Chapter 6
No ratings yet
Management Information Systems Testbank Answers and Sloution Laudon Chapter 6
23 pages
Data Analytics Tableau & Python
No ratings yet
Data Analytics Tableau & Python
15 pages
Murali Krishna
No ratings yet
Murali Krishna
1 page
Complete Data Fabric and Data Mesh Approaches With AI 1st Edition Eberhard Hechler PDF For All Chapters
100% (2)
Complete Data Fabric and Data Mesh Approaches With AI 1st Edition Eberhard Hechler PDF For All Chapters
79 pages
10.3233@SW 2010 0004
No ratings yet
10.3233@SW 2010 0004
5 pages
Mlii 104 Practical
No ratings yet
Mlii 104 Practical
9 pages
Abacus Workspace User Manual
No ratings yet
Abacus Workspace User Manual
74 pages
Procedure To Write A Bibliometrics Paper
No ratings yet
Procedure To Write A Bibliometrics Paper
48 pages
Unit 3 OLAP and OLTP
No ratings yet
Unit 3 OLAP and OLTP
64 pages
Databases:: Prof. Fabio Crestani
No ratings yet
Databases:: Prof. Fabio Crestani
33 pages
Chapter 6: Process Synchronization: Silberschatz, Galvin and Gagne ©2009 Operating System Concepts - 8 Edition
No ratings yet
Chapter 6: Process Synchronization: Silberschatz, Galvin and Gagne ©2009 Operating System Concepts - 8 Edition
67 pages
Manual de ERwin 7.3
No ratings yet
Manual de ERwin 7.3
67 pages
DSpace training
No ratings yet
DSpace training
3 pages