Class 6
Class 6
Introduction
• DBMS has to store data somewhere
• Choices:
– Main memory
• Expensive – compared to secondary and tertiary
storage
• Fast – in memory operations are fast
• Used for storing current data
– Secondary storage (hard disk)
• Less expensive – compared to main memory
• Slower – compared to main memory, faster
compared to tapes
• Used for storing the database
DBMS stores data on hard disks
• This means that data needs to be
– read from the hard disk into memory
(RAM)
– Written from the memory onto the hard
disk
Basics of Data storage on hard disk
• A disk is organized into a number of
blocks or pages
• A page is the unit of exchange between
the disk and the main memory
• A collection of pages is known as a file
• DBMS stores data in one or more files on
the hard disk
Database Tables on Hard Disk
• Database tables are made up of one or more
tuples (rows)
• Each tuple has one or more attributes
• One or more tuples from a table are written into
a page on the hard disk
– Larger tuples may need more than one page!
– Tuples on the disk are known as records
– Records are separated by record delimiter
– Attributes on the hard disk are known as fields
– Fields are separated by field delimiter
File Organization
• The physical arrangement of data in a file into records and pages on
the disk
• File organization determines the set of access methods for
– Storing and retrieving records from a file
• Therefore, ‘file organization’ synonymous with ‘access method’
• We study three types of file organization
– Unordered or Heap files
– Ordered or sequential files
– Hash files
• We examine each of them in terms of the operations we perform on
the database
– Insert a new record
– Search for a record (or update a record)
– Delete a record
Unordered Or Heap File
• Records are stored in the same order in which they are
created
• Insert operation
– Fast – because the incoming record is written at the end of the
last page of the file
• Search (or update) operation
– Slow – because linear search is performed on pages
• Delete Operation
– Slow – because the record to be deleted is first searched for
– Deleting the record creates a hole in the page
– Periodic file compacting work required to reclaim the wasted
space
Ordered or Sequential File
• Records are sorted on the values of one or more fields
– Ordering field – the field on which the records are sorted
– Ordering key – the key of the file when it is used for record sorting
• Search (or update) Operation
– Fast – because binary search is performed on sorted records
– Update the ordering field?
• Delete Operation
– Fast – because searching the record is fast
– Periodic file compacting work is, of course, required
• Insert Operation
– Poor – because if we insert the new record in the correct position we need to
shift all the subsequent records in the file
– Alternatively an ‘overflow file’ is created which contains all the new records as a
heap
– Periodically overflow file is merged with the main file
Hash File
• A bucket is a unit of storage containing one or more
records (a bucket is typically a disk block).
• In a hash file organization we obtain the bucket of a
record directly from its search-key value using a hash
function.
• Hash function is used to locate records for access,
insertion as well as deletion.
• Hashing can be used not only for file organization, but
also for index-structure creation.
• A hash index organizes the search keys, with their
associated record pointers, into a hash file structure.
Hash File (2)
• Insert Operation
– Fast – because the hash function computes the index
of the bucket to which the record belongs
• If that bucket is full you go to the next free one
• Search Operation
– Fast – because the hash function computes the index
of the bucket
• Performance may degrade if the record is not found in the
bucket suggested by hash function
• Delete Operation
– Fast – once again for the same reason of hashing
function being able to locate the record quick
Indexing
• Index - a data structure that allows the DBMS
to locate particular records in a file more quickly
– Very similar to the index at the end of a book
to locate various topics covered in the book
• Types of Index
– Primary Clustering index – one clustering
index per file – data file is ordered on the key
field and the index file is built on that key field
– Secondary index – many secondary indexes
per file
Primary Clustered Indexes
• The data file is sequentially ordered on the key field
• Index file stores all values of the key field and the page
number of the data file in which the corresponding record
is stored
B002 1
Branch B002 record 1 Branch
B003 1 Branch B003 record
BranchNo Street City Postcode
Branch B004 record 2 B002 56 Clover Dr London NW10 6EU
B004 2 Branch B005 record
B003 163 Main St Glasgow G11 9QX
Branch B007 record 3 B004 32 Manse Rd Bristol BS99 1NZ
B005 2 B005 22 Deer Rd London SW1 4EH
– sp_helpindex tablename