0% found this document useful (0 votes)
24 views

Class 6

DBMS stores database tables on disk by writing tuples into pages. This document discusses different methods of organizing data on disk, including unordered files, ordered files, and hash files. It also covers indexing, which allows the DBMS to locate records more quickly through the use of index files that map key values to data locations. Primary and secondary indexes are described as ways to improve query performance.

Uploaded by

Debobrata Mondal
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Class 6

DBMS stores database tables on disk by writing tuples into pages. This document discusses different methods of organizing data on disk, including unordered files, ordered files, and hash files. It also covers indexing, which allows the DBMS to locate records more quickly through the use of index files that map key values to data locations. Primary and secondary indexes are described as ways to improve query performance.

Uploaded by

Debobrata Mondal
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 15

File Organization

Introduction
• DBMS has to store data somewhere
• Choices:
– Main memory
• Expensive – compared to secondary and tertiary
storage
• Fast – in memory operations are fast
• Used for storing current data
– Secondary storage (hard disk)
• Less expensive – compared to main memory
• Slower – compared to main memory, faster
compared to tapes
• Used for storing the database
DBMS stores data on hard disks
• This means that data needs to be
– read from the hard disk into memory
(RAM)
– Written from the memory onto the hard
disk
Basics of Data storage on hard disk
• A disk is organized into a number of
blocks or pages
• A page is the unit of exchange between
the disk and the main memory
• A collection of pages is known as a file
• DBMS stores data in one or more files on
the hard disk
Database Tables on Hard Disk
• Database tables are made up of one or more
tuples (rows)
• Each tuple has one or more attributes
• One or more tuples from a table are written into
a page on the hard disk
– Larger tuples may need more than one page!
– Tuples on the disk are known as records
– Records are separated by record delimiter
– Attributes on the hard disk are known as fields
– Fields are separated by field delimiter
File Organization
• The physical arrangement of data in a file into records and pages on
the disk
• File organization determines the set of access methods for
– Storing and retrieving records from a file
• Therefore, ‘file organization’ synonymous with ‘access method’
• We study three types of file organization
– Unordered or Heap files
– Ordered or sequential files
– Hash files
• We examine each of them in terms of the operations we perform on
the database
– Insert a new record
– Search for a record (or update a record)
– Delete a record
Unordered Or Heap File
• Records are stored in the same order in which they are
created
• Insert operation
– Fast – because the incoming record is written at the end of the
last page of the file
• Search (or update) operation
– Slow – because linear search is performed on pages
• Delete Operation
– Slow – because the record to be deleted is first searched for
– Deleting the record creates a hole in the page
– Periodic file compacting work required to reclaim the wasted
space
Ordered or Sequential File
• Records are sorted on the values of one or more fields
– Ordering field – the field on which the records are sorted
– Ordering key – the key of the file when it is used for record sorting
• Search (or update) Operation
– Fast – because binary search is performed on sorted records
– Update the ordering field?
• Delete Operation
– Fast – because searching the record is fast
– Periodic file compacting work is, of course, required
• Insert Operation
– Poor – because if we insert the new record in the correct position we need to
shift all the subsequent records in the file
– Alternatively an ‘overflow file’ is created which contains all the new records as a
heap
– Periodically overflow file is merged with the main file
Hash File
• A bucket is a unit of storage containing one or more
records (a bucket is typically a disk block).
• In a hash file organization we obtain the bucket of a
record directly from its search-key value using a hash
function.
• Hash function is used to locate records for access,
insertion as well as deletion.
• Hashing can be used not only for file organization, but
also for index-structure creation.
• A hash index organizes the search keys, with their
associated record pointers, into a hash file structure.
Hash File (2)
• Insert Operation
– Fast – because the hash function computes the index
of the bucket to which the record belongs
• If that bucket is full you go to the next free one
• Search Operation
– Fast – because the hash function computes the index
of the bucket
• Performance may degrade if the record is not found in the
bucket suggested by hash function
• Delete Operation
– Fast – once again for the same reason of hashing
function being able to locate the record quick
Indexing
• Index - a data structure that allows the DBMS
to locate particular records in a file more quickly
– Very similar to the index at the end of a book
to locate various topics covered in the book
• Types of Index
– Primary Clustering index – one clustering
index per file – data file is ordered on the key
field and the index file is built on that key field
– Secondary index – many secondary indexes
per file
Primary Clustered Indexes
• The data file is sequentially ordered on the key field
• Index file stores all values of the key field and the page
number of the data file in which the corresponding record
is stored
B002 1
Branch B002 record 1 Branch
B003 1 Branch B003 record
BranchNo Street City Postcode
Branch B004 record 2 B002 56 Clover Dr London NW10 6EU
B004 2 Branch B005 record
B003 163 Main St Glasgow G11 9QX
Branch B007 record 3 B004 32 Manse Rd Bristol BS99 1NZ
B005 2 B005 22 Deer Rd London SW1 4EH

4 B007 16 Argyll St Aberdeen AB2 3SU


B007 3
Secondary Indexes
• An index file that uses a non primary field as an
index e.g. City field in the branch table
• They improve the performance of queries that
use attributes other than the primary key
• But there is the overhead of maintaining a large
number of these indexes
Creating indexes in SQL
• You can create an index for every table you
create in SQL
• For example
– CREATE INDEX indexname on
tablename(attribute name);

– sp_helpindex tablename

– DROP INDEX indexname;


Summary
• File organization or access method
determines the performance of search,
insert and delete operations.
– Access methods are the primary means to
achieve improved performance
• Index structures help to improve the
performance further

You might also like