0% found this document useful (0 votes)

63 views

UNIT-IV - File Organization

The document provides an overview of storage and indexing in database management systems. It discusses how data is stored on external storage devices like disks and tapes and organized into files, records, and pages. It describes different types of file organization, including unordered, ordered, and hash files. It also covers index structures like primary, secondary, and cluster indexes that allow efficient retrieval of records. Index data structures can be hash-based or tree-based, with B-trees being a common tree structure used.

Uploaded by

Mummana Mohan Shankar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

UNIT-IV - File Organization

Uploaded by

Mummana Mohan Shankar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 10

UNIT-V

Overview of Storage and Indexing: Data on External Storage – File Organization and
Indexing – Cluster Indexes, Primary and Secondary Indexes – Index data Structures –
Hash Based Indexing – Tree base Indexing
Data on External Storage

 DBMS stores vast quantities of data

 Therefore, data is stored on external storage devices, such as disks and tapes, and
fetched into main memory as needed for processing.
 The unit of information read from or written to disk is a page.
 The size of page typically is 4KB or 8 KB.
 The database systems are carefully optimized to minimize the cost of page I/O
 If we read several pages in the order that they are stored physically, the cost can
be much less than the cost of reading the same pages in a random order.
 Each record in a file has a unique identified called a record id (or) rid. Using rid
we can identify the disk address of the page containing the record.
 Data is read into memory for processing and written to disk for persistent storage,
by a layer of software called the buffer manager.
 Space on disk is managed by the disk space manager.

File organization The physical arrangement of data in a file into records and pages on
secondary storage. The order in which records are stored and accessed in the file is
dependent on the file organization. The main types of file organization are:
 Heap (unordered) Files: Records are placed on disk in no particular order
 Sequential (ordered) Files: Records are ordered by the value of a specified
field.
 Hash Files: Records are placed on disk according to hash function
Unordered Files
The records are placed in the file in the same order as they are inserted. A new record is
inserted in the last page of the file; if there is insufficient space in the last page, a new
page is added to the file. This makes insertion very efficient. However, as a heap file has
no particular ordering with respect to field values, a linear search must be performed to
access a record. A linear search involves reading pages from the file until the required
record is found. This makes retrievals from heap files that have more than a few pages
relatively slow. To delete a record, the required page first has to be retrieved, the record
marked as deleted, and the page written back to disk. The spaced with deleted records is
not reused. Heap files are one of the best organizations for bulk loading data into a table,
as records are inserted at the end of the sequence; there is no overhead of calculating
what page the record should go on.
Ordered Files
The records in a file can be stored on the values of one or more of the fields. The
resulting file is called an ordered or sequential file. The field(s) that the file is sorted on is
called the ordering field. If the ordering field is also a key of the file, and therefore
guaranteed to have a unique value in each record, the field is also called the ordering key
To search a particular record, a binary search can be performed because already the
records are in sorted order. In general, the binary search is more efficient that a linear
search.
Inserting and deleted records in a sorted file are problematic because the order of records
has to be maintained. To insert a new record, we must find the correct position in the
ordering for the record and then find space to insert it. If there is sufficient space in the
required page for the new record, then the single page can be reordered and written back
to disk. If this is not the case, then it would be necessary to move one or more records on
to the next page. Again, the next page may have no free space and the records on this
page must be move, and so on.
To delete a record we must reorganize the records to remove the new free slot.
Ordered files are rarely used for database storage unless a primary index is added to the
file.
Hash Files
In a hash file, records do not have to be written sequentially to the file. Instead, a hash
function calculates the address of the page in which the record is to be stored based on
one or more fields in the record. The base filed is called the hash field, or if the field is
also a key field of the file, it is called the hash key. Records in a hash file will appear to
be randomly distributed across the available file space. The hash function is chosen so
that records are as evenly distributed as possible throughout the file. One popular
technique is the division-remainder hashing. This technique uses the MOD function,
which takes the field value, divides it by some predetermined integer value, and uses the
remainder of this division as the disk address.

Index: A data structure that allows the DBMS to locate particular records in a file more
quickly and there by speed response to user queries.
 An index structure is associated with a particular search key and contains records
consisting of the key value and the address of the logical record in the file
containing the key value.
 The file containing the logical records is called the data file.
 The file containing the index records is called the index file.
 The values in the index file are ordered according to the indexing field, which is
usually based on a single attribute.
Types of Index
 Primary Index: - The data file is sequentially ordered by an ordering key field
and the indexing field is built on the ordering key field, which is guaranteed to
have a unique value in each record.
 Clustering Index: - The data file is sequentially ordered on a non-key field, and
the indexing field is built on this non-key field, so that there can be more than one
record corresponding to a value of the indexing field. The non-key field is called a
clustering attribute.
 Secondary Index: - An index that is defined on a non-ordering field of the data
file is called Secondary index.
 Spare Index: - In sparse index, index records are not created for every search
key. An index record here contains a search key and an actual pointer to the data
on the disk. To search a record, we first proceed by index record and reach at the
actual location of the data. If the data we are looking for is not where we directly
reach by following the index, then the system starts sequential search until the
desired data is found.

 Dense Index: - In dense index, there is an index record for every search key
value in the database. This makes searching faster but requires more space to store
index records itself. Index records contain search key value and a pointer to the
actual record on the disk.
 Multilevel Index: - Index records comprise search-key values and data
pointers. Multilevel index is stored on the disk along with the actual database
files. As the size of the database grows, so does the size of the indices. There is an
immense need to keep the index records in the main memory so as to speed up the
search operations. If single-level index is used, then a large size index cannot be
kept in memory which leads to multiple disk accesses.

NOTE: - A file can have at most one primary index or one clustering index, and in
addition can have several secondary indexes.
INDEX DATA STRUCTURES

 One way to organize data entries is to hash data entries on the search key
(HASHBASED INDEXING)
 Another way to organize data entries is to build a tree-like data structure that
directs a search for data entries (TREEBASED INDEXING)

HASH BASED INDEXING

We can organize records using a technique called hashing to quickly find records
that have a given search key value. For example, if the file of employee records is hashed
on the name field, we can retrieve all records about Joe.
 The records in a file are grouped in buckets.
 A bucket consists of a primary page and possibly, additional pages linked in a
chain.
 The bucket to which a record belongs can be determined by applying a special
function, called a hash function to the search key.
 INSERT: on inserts, the record is inserted into the appropriate bucket, with
overflow pages allocated as necessary.
 SEARCH: To search for a record with a given search key value we apply the hash
function to identify the bucket to which such records belong and look at all pages
in that bucket.

Hash indexing is illustrates in below fig., where the data is stored in a file that is
hashed on age; the data entries in this first index file are the actual data records. Applying
the hash function to the age field identifies the page that the record belongs to. The hash
function h for this example is quite simple; it converts the search key value to its binary
representation and uses the two least significant bits as the bucket identifier.
Fig. Index-Organized File Hashed on age, with Auxiliary Index on sal.

TREE BASED INDEXING

An alternative to has-base indexing is to organize records using a tree-like data
structure. The data entries are arranged in sorted order by search key value, and a
hierarchical search data structure is maintained that directs searches to the correct page to
data entries.
The following fig. shows the employee records in a tree-structured index with search key
age. The lowest level of the tree, called the leaf level, contains the data entries;

Example: Find all data entries with 24<age<50

What is the difference between ISAM and B+ Trees?
SNO (Indexed Sequential Access Method) B+ Tree
ISAM

1 ISAM tree is a static index structure

B+ tree is a dynamic index structure

2 It is effective when the file is not It is not effective when the file is not
frequently updated frequently updated

3 It is unsuitable for files that grow and

It is suitable for files that grow and shrink
shrink a lot
a lot.

4 It will not adjust to changes in the file It will adjust to changes in the file
gracefully.

5 It is rarely used index structures It is most widely used index structures

6 It will not support both equality and It will support both equality and range
range queries queries

7 It suffers from long overflow chains. It will not suffer from long overflow
chains.

8 In ISAM, the set of primary leaf pages In B+ Trees, the set of primary leaf pages
was static. are not static.

DIFFERENCE BETWEEN B AND B+ TREE

B tree indices are similar to B+ tree indices. The primary distinction between the two
approaches is that a B-tree eliminates the redundant storage of search key values. Search
keys are not repeated in B tree indices.
Given below the major difference between B tree and B+ tree structure.
1. In a B tree search keys and data stored in internal or leaf nodes. But in B+-tree data
store only leaf nodes.
2. Searching of any data in a B+ tree is very easy because all data are found in leaf nodes
otherwise in a B tree data cannot found in leaf node.
3. In B tree data may found leaf or non leaf node. Deletion of non leaf node is very
complicated. Otherwise in a B+ tree data must found leaf node. So deletion is easy in leaf
node.
4. Insertion of a B tree is more complicated than B+ tree.
5. B +tree store redundant search key but b-tree has no redundant value.
6. In B+ tree leaf node data are ordered in a sequential linked list but in B tree the leaf
node cannot stored using linked list.
Many database system implementers prefer the structural simplicity of a b+ tree.
7.In a B-tree, pointers to data records exist at all levels of the tree
In a B+-tree, all pointers to data records exists at the leaf-level nodes
8.A B+-tree can have less levels (or higher capacity of search values) than the
corresponding B-tree

Searching a record in B+ Tree

Suppose we want to search 65 in the below B+ tree structure. First we will fetch for the
intermediary node which will direct to the leaf node that can contain record for 65. So we
find branch between 50 and 75 nodes in the intermediary node. Then we will be
redirected to the third leaf node at the end. Here DBMS will perform sequential search to
find 65. Suppose, instead of 65, we have to search for 60. What will happen in this case?
We will not be able to find in the leaf node. No insertions/update/delete is allowed during
the search in B+ tree.

Insertion in B+ tree

Suppose we have to insert a record 60 in below structure. It will go to 3 rd leaf node after
55. Since it is a balanced tree and that leaf node is already full, we cannot insert the
record there. But it should be inserted there without affecting the fill factor, balance and
order. So the only option here is to split the leaf node. But how do we split the nodes?

The 3rd leaf node should have values (50, 55, 60, 65, 70) and its current root node is 50.
We will split the leaf node in the middle so that its balance is not altered. So we can
group (50, 55) and (60, 65, 70) into 2 leaf nodes. If these two has to be leaf nodes, the
intermediary node cannot branch from 50. It should have 60 added to it and then we can
have pointers to new leaf node.

This is how we insert a new entry when there is overflow. In normal scenario, it is simple
to find the node where it fits and place it in that leaf node.

Delete in B+ tree

Suppose we have to delete 60 from the above example. What will happen in this case?
We have to remove 60 from 4th leaf node as well as from the intermediary node too. If we
remove it from intermediary node, the tree will not satisfy B+ tree rules. So we need to
modify it have a balanced tree. After deleting 60 from above B+ tree and re-arranging
nodes, it will appear as below.

Suppose we have to delete 15 from above tree. We will traverse to the 1st leaf node and
simply delete 15 from that node. There is no need for any re-arrangement as the tree is
balanced and 15 do not appear in the intermediary node.

DBMS - R18 UNIT 5 Notes
86% (7)
DBMS - R18 UNIT 5 Notes
23 pages
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
From Everand
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
1/5 (1)
SmartPlant Instrumentation
100% (3)
SmartPlant Instrumentation
319 pages
Chapter 5. Record Storage and Primary File Organization
No ratings yet
Chapter 5. Record Storage and Primary File Organization
18 pages
File Organization
No ratings yet
File Organization
11 pages
22-File Organization-06-09-2024
No ratings yet
22-File Organization-06-09-2024
23 pages
File Organization
No ratings yet
File Organization
41 pages
Class 6
No ratings yet
Class 6
15 pages
Unit5 File Organization
No ratings yet
Unit5 File Organization
112 pages
DBMS UNIT-5
No ratings yet
DBMS UNIT-5
23 pages
R22 Unit 5
No ratings yet
R22 Unit 5
23 pages
Unit 6 notes DBMS final
No ratings yet
Unit 6 notes DBMS final
14 pages
Unit 4 Chapter 1 Storage and Querying
No ratings yet
Unit 4 Chapter 1 Storage and Querying
37 pages
Indexing
No ratings yet
Indexing
62 pages
File Organization Methods
No ratings yet
File Organization Methods
22 pages
DBMS-U5 Notes
No ratings yet
DBMS-U5 Notes
16 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
DS_TM_Study_Material_Presentations_Unit-4_1TM
No ratings yet
DS_TM_Study_Material_Presentations_Unit-4_1TM
22 pages
File Organization & Indexing: Reading: C&B, Appendix C
No ratings yet
File Organization & Indexing: Reading: C&B, Appendix C
17 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
13 pages
L2.2-File Organization Techniques
No ratings yet
L2.2-File Organization Techniques
42 pages
Chapter 11. File Organisation and Indexes
No ratings yet
Chapter 11. File Organisation and Indexes
56 pages
File Organizations and Indexes
No ratings yet
File Organizations and Indexes
51 pages
09_FIle.pptx
No ratings yet
09_FIle.pptx
22 pages
DBMS - R2017 - Anna University
No ratings yet
DBMS - R2017 - Anna University
20 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
Unit 5
No ratings yet
Unit 5
20 pages
File Organization
No ratings yet
File Organization
45 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
81 pages
File Organization Notes
No ratings yet
File Organization Notes
21 pages
Dbms Unit III Notes
No ratings yet
Dbms Unit III Notes
27 pages
DBMS-Unit5-PPT (1)
No ratings yet
DBMS-Unit5-PPT (1)
40 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
CIT 401 Lecture Note
No ratings yet
CIT 401 Lecture Note
46 pages
Unit 5 DBMS
No ratings yet
Unit 5 DBMS
38 pages
7-Indexing and Block
No ratings yet
7-Indexing and Block
20 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Database Management: Department of Computer Science, School of Computing Sciences
No ratings yet
Database Management: Department of Computer Science, School of Computing Sciences
24 pages
DSA Unit6 Theory
No ratings yet
DSA Unit6 Theory
23 pages
Indexing_Hashing_Files
No ratings yet
Indexing_Hashing_Files
68 pages
Self Unit 2
No ratings yet
Self Unit 2
18 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
No ratings yet
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
41 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
Unit-3 Part 2 Indexing and Hashing
No ratings yet
Unit-3 Part 2 Indexing and Hashing
36 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
ss2 DPR Second Term
No ratings yet
ss2 DPR Second Term
5 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
33 pages
Unit 6.2 Indexing and Hashing
No ratings yet
Unit 6.2 Indexing and Hashing
37 pages
DBMS Chapter 4 Record Organization and Dile Management
No ratings yet
DBMS Chapter 4 Record Organization and Dile Management
36 pages
file organization
No ratings yet
file organization
9 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
Data Storage: Agnibesh Samanta Mba-Final Year
No ratings yet
Data Storage: Agnibesh Samanta Mba-Final Year
12 pages
d-s-s-1
No ratings yet
d-s-s-1
6 pages
Storage and Querying in DBMS
No ratings yet
Storage and Querying in DBMS
45 pages
1 - Disk Storage - Ch13
No ratings yet
1 - Disk Storage - Ch13
31 pages
Indexing - DBMS
No ratings yet
Indexing - DBMS
20 pages
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Beginning Data Structures Using C
From Everand
Beginning Data Structures Using C
Yogish Sachdeva
4.5/5 (7)
CVVC Unit 1
No ratings yet
CVVC Unit 1
40 pages
Adobe Scan 26 Feb 2022
No ratings yet
Adobe Scan 26 Feb 2022
8 pages
Adobe Scan 26 Feb 2022
No ratings yet
Adobe Scan 26 Feb 2022
11 pages
20L31A04B6
No ratings yet
20L31A04B6
3 pages
DBMS Unit3
No ratings yet
DBMS Unit3
33 pages
Unit 5 TM
No ratings yet
Unit 5 TM
15 pages
Satellite Communications
No ratings yet
Satellite Communications
2 pages
RADAR SYSTEMS VR19-unit1-1
No ratings yet
RADAR SYSTEMS VR19-unit1-1
128 pages
Chapter 2 Database Concepts and Applications
No ratings yet
Chapter 2 Database Concepts and Applications
16 pages
OReilly DP 203 Slide Deck
No ratings yet
OReilly DP 203 Slide Deck
93 pages
Big Data Analytics
No ratings yet
Big Data Analytics
14 pages
Database Management Systems Using PL-SQL
No ratings yet
Database Management Systems Using PL-SQL
37 pages
Lecture 6 - DBMS Keys Primary, Candidate, Super, Alternate and Foreign
No ratings yet
Lecture 6 - DBMS Keys Primary, Candidate, Super, Alternate and Foreign
17 pages
04 - Aggregation Operations
No ratings yet
04 - Aggregation Operations
68 pages
15 QueryOptimization
No ratings yet
15 QueryOptimization
24 pages
(Acyitm1 k33) Group 3 - Collaborative Notes
No ratings yet
(Acyitm1 k33) Group 3 - Collaborative Notes
19 pages
AWS Certified Developer - Associate Guide: Your One-Stop Solution to Passing the AWS Developer's 2019 (DVA-C01) Certification Parmar download
100% (2)
AWS Certified Developer - Associate Guide: Your One-Stop Solution to Passing the AWS Developer's 2019 (DVA-C01) Certification Parmar download
62 pages
Analyst Interview Questions - AMAZON
75% (4)
Analyst Interview Questions - AMAZON
47 pages
Question Bank
No ratings yet
Question Bank
4 pages
A Step by Step Oracle DB Migration Test Case 1704032098
No ratings yet
A Step by Step Oracle DB Migration Test Case 1704032098
11 pages
Exercise 4 Transactions
No ratings yet
Exercise 4 Transactions
2 pages
Answer Selected Answer: Correct Answer:: 2.5 Out of 2.5 Points
No ratings yet
Answer Selected Answer: Correct Answer:: 2.5 Out of 2.5 Points
9 pages
Manual de PHP (INFORMIX)
No ratings yet
Manual de PHP (INFORMIX)
5 pages
MySQL Cookbook, 4th Edition (Second Early Release) Sveta Smirnova & Alkin Tezuysal 2024 scribd download
No ratings yet
MySQL Cookbook, 4th Edition (Second Early Release) Sveta Smirnova & Alkin Tezuysal 2024 scribd download
50 pages
Learn ABAP in 1 Day: ALL RIGHTS RESERVED. No Part of This Publication May Be Reproduced or
No ratings yet
Learn ABAP in 1 Day: ALL RIGHTS RESERVED. No Part of This Publication May Be Reproduced or
20 pages
DBMS Interview Questions PDF
No ratings yet
DBMS Interview Questions PDF
14 pages
Lecture #9.1 - Apache Spark - Streaming API II
No ratings yet
Lecture #9.1 - Apache Spark - Streaming API II
31 pages
Computer-Assisted Legal Research (CALR) - Technology That Allows Lawyers and Judges To
No ratings yet
Computer-Assisted Legal Research (CALR) - Technology That Allows Lawyers and Judges To
5 pages
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
No ratings yet
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
8 pages
Database Section 1
No ratings yet
Database Section 1
18 pages
Foundations of Databases PDF
No ratings yet
Foundations of Databases PDF
16 pages
767 Implementing A SQL Data Warehouse: Exam Design
No ratings yet
767 Implementing A SQL Data Warehouse: Exam Design
4 pages
GNU Grep Cheat Sheet: Basics
No ratings yet
GNU Grep Cheat Sheet: Basics
2 pages
Redgate SQL Server Statistics
No ratings yet
Redgate SQL Server Statistics
44 pages
Unit 7 Assessment - Attempt Review - Saylor Academy
No ratings yet
Unit 7 Assessment - Attempt Review - Saylor Academy
18 pages
Persistence With Spring PDF
No ratings yet
Persistence With Spring PDF
98 pages
Relational Data Model
No ratings yet
Relational Data Model
108 pages

UNIT-IV - File Organization

Uploaded by

UNIT-IV - File Organization

Uploaded by

UNIT-V

 DBMS stores vast quantities of data

HASH BASED INDEXING

TREE BASED INDEXING

Example: Find all data entries with 24<age<50

1 ISAM tree is a static index structure

3 It is unsuitable for files that grow and

5 It is rarely used index structures It is most widely used index structures

DIFFERENCE BETWEEN B AND B+ TREE

Searching a record in B+ Tree

You might also like