0% found this document useful (0 votes)
2 views

Lecture Data Storage

The document discusses data storage in database management systems, focusing on how databases are represented in files on disk and how data is managed in memory. It covers storage hierarchy, record and page layouts, file organization methods, and the role of the data dictionary and buffer manager. Additionally, it addresses buffer replacement policies and locking mechanisms for concurrent operations.

Uploaded by

apoorvaneha20
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture Data Storage

The document discusses data storage in database management systems, focusing on how databases are represented in files on disk and how data is managed in memory. It covers storage hierarchy, record and page layouts, file organization methods, and the role of the data dictionary and buffer manager. Additionally, it addresses buffer replacement policies and locking mechanisms for concurrent operations.

Uploaded by

apoorvaneha20
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Data Storage

Rashmi Dutta Baruah


Department of Computer Science & Engineering
Data Storage
• Key Questions

– How the DBMS represents the database in files on disk :


Data Storage Structures

– How the DBMS manages its memory and move data back-
and-forth from disk: Data Storage Management

23/2/2021 CS 245 2
Overview of Physical Storage Media
• Can differentiate storage into:
– volatile storage: loses contents when power is switched
off
– non-volatile storage:
• Contents persist even when power is switched off.
• Includes secondary and tertiary storage

• Factors affecting choice of storage media include


– Speed with which data can be accessed
– Cost per unit of data
– Reliability

23/2/2021 CS 245 3
Storage Hierarchy
Faster
Smaller
Expensive
Volatile
Byte-addressable

(SSD)
Non-volatile
Block-addressable
(HDD)

Slower
Larger
Cheaper

23/2/2021 CS 245 4
Storage Hierarchy (Cont.)
• primary storage: Fastest media but volatile (cache,
main memory).
• secondary storage: next level in hierarchy, non-
volatile, moderately fast access time
– Also called on-line storage
– E.g., flash memory, magnetic disks
• tertiary storage: lowest level in hierarchy, non-
volatile, slow access time
– also called off-line storage and used for archival storage
– e.g., magnetic tape, optical storage
– Magnetic tape
• Sequential access, 1 to 12 TB capacity
• A few drives with many tapes
• Juke boxes with petabytes (1000’s of TB) of storage

23/2/2021 CS 245 5
Typical Access Times
~0.000007 milliseconds

RAM ~0.0001 milliseconds

~0.02 to 0.15 milliseconds


DISK

~5 to 20 milliseconds

23/2/2021 CS 245 6
Data Storage Structures
• Databases – records
• Disks- blocks
– disk block (contiguous sequence of bytes) is unit of storage
– data are read or written in units of blocks

• Storage manager – pages


– Size of page is chosen to be the size of a disk block
• Most databases use OS files as a intermediate layer
for storing records.

23/2/2021 CS 245 7
Data Storage Structures
• The database is stored as a collection of files. Each
file is a sequence of records. A record is a sequence
of fields.

• Key questions
– How do we organize fields in a record? – Record Layout
– How do we organize tuples within a page? – Page Layout
– How do we organize pages into a file? –File organization

23/2/2021 CS 245 8
Record Layout
• Record is identified using record id (rid)
• Records type instructor = record
– Fixed length ID varchar (5);
name varchar(20)
– Variable length dept_name varchar(20)
salary numeric (8,2)
• Fixed-Length Records end

– each field has a fixed length


– number of fields is fixed

10101 Srinivasan Computer Science 65000

L1 L2 field length
Information about field lengths
Base address (B) Address = B +L1+L2 and types is in the system catalog
(Data dictionary)
23/2/2021 CS 245 9
Record Layout
• Variable-Length Records
– Variable length fields: strings (ID, name, dept)

Computer
65000 1010 Srinivasan
Science

Record Header:
(offset) NULLS require no space

Bytes 0 (36,10)
36 45

Computer
65000 1010 Srinivasan
Science

Record Header:
(offset, length)
Bitmap for null values

23/2/2021 CS 245 10
Record Layout
• Variable-Length Records Issues
– Field may grow while modifying the record

– Some DBMS (IBM DB2 and Microsoft SQL server) do not


allow records to span pages whereas others (Oracle 8)
allow large records to span pages and are organized as
singly directed list.

23/2/2021 CS 245 11
Page Layout
• Issues to consider
– 1 page = 1 disk block = fixed size (e.g. 8KB)
– Records: Fixed length , Variable length

• One simple approach


– Assume record size is fixed (later we will see for variable
size record)
– Each file has records of one particular type only
– Different files are used for different relations
(we assume that records are smaller than a disk block)

23/2/2021 CS 245 12
Page Layout
• Page - collection of slots
– each slot contains a record

• Record is identified using record id (rid)


• Record id = RID
– like a pointer to a tuple
– Typically rid = <pageID, slotNumber>

• Fixed Length Records


– uniform record slots arranged consecutively within a page

23/2/2021 CS 245 13
Page Layout
Free Space N

Slot 1 Slot 2 Slot N Page Header

• How do we insert records?


• How do we delete records?

Free Space 1 1 0 M

Slot 1 Slot 2 Slot M M 2


Page Header

Figure: Slotted page organization for Fixed Length Records

23/2/2021 CS 245 14
Page Layout

Free Space M F

Slot 1 Slot 2 Slot M Page Header


Slot Directory:
Page header contains: <record offset, record length>
Slot directory + # entries + free space pointer

Figure: Slotted page organization for Variable Length Records

• How do we insert records?


• How do we delete records?

23/2/2021 CS 245 15
File organization
• How to organize collection of pages as file?
(pages are collection of records, file can span several pages)
– Heap file organization (unordered)
– Sequential file organization (ordered)

• Heap file organization


– records not ordered
– Insertion can be done at the end of the file or utilize the
free space created after record deletion.
– need to keep track of free space

23/2/2021 CS 245 16
File organization

23/2/2021 CS 245 17
File organization
• Sequential file organization
– records are sorted based on some search key
– Search key – any attribute or set of attributes
– each record points to the next in search key order

23/2/2021 CS 245 18
File organization
• Deletion – use pointer chains
• Insertion –locate the position where the record is to
be inserted
– if there is free space insert
there
– if no free space, insert the
record in an overflow block
– In either case, pointer
chain must be updated
• Need to reorganize the file
from time to time to
restore sequential order

23/2/2021 CS 245 19
Data Dictionary Storage
• The Data dictionary (also called system catalog) stores
metadata; that is, data about data, such as
• Information about relations
– names of relations
– names, types and lengths of attributes of each relation
– names and definitions of views
– integrity constraints
• User and accounting information, including passwords
• Statistical and descriptive data
– number of tuples in each relation
• Physical file organization information
– How relation is stored (sequential/heap/…)
– Physical location of relation
• Information about indices (will be discussed in next
lecture)
24/2/2021 CS 245 20
Relational Representation of System Metadata

• Relational
representation
on disk

• Specialized data
structures
designed for
efficient access,
in memory

24/2/2021 CS 245 21
Storage Access
• Blocks are units of both storage allocation and data transfer.
• Database system seeks to minimize the number of block
transfers between the disk and memory. We can reduce the
number of disk accesses by keeping as many blocks as possible
in main memory.
• Buffer – portion of main memory available to store copies of
disk blocks.
• Buffer manager – subsystem responsible for allocating buffer
space in main memory.
Page request from higher-level code

24/2/2021 CS 245 22
Buffer Manager
• Programs call on the buffer manager when they need
a block from disk.
– If the block is already in the buffer, buffer manager returns
the address of the block in main memory
– If the block is not in the buffer, the buffer manager
• Allocates space in the buffer for the block
– Replacing (throwing out) some other block, if required, to make space
for the new block.
– Replaced block written back to disk only if it was modified since the
most recent time that it was written to/fetched from the disk.
• Reads the block from the disk to the buffer, and returns the
address of the block in main memory to requester.

24/2/2021 CS 245 23
Buffer Manager
• Buffer replacement strategy
• Pinned block: memory block that is not allowed to be
written back to disk
– Pin done before reading/writing data from a block
– Unpin done when read /write is complete
– Multiple concurrent pin/unpin operations possible
• Keep a pin count, buffer block can be evicted only if pin count = 0
• Shared and exclusive locks on buffer
– Needed to prevent concurrent operations from reading page
contents as they are moved/reorganized, and to ensure only one
move/reorganize at a time
– Readers get shared lock, updates to a block require exclusive lock
– Locking rules:
• Only one process can get exclusive lock at a time
• Shared lock cannot be concurrently held with exclusive lock
• Multiple processes may be given shared lock concurrently

24/2/2021 CS 245 24
Buffer-Replacement Policies
• Most operating systems replace the block least recently used
(LRU strategy)
– Idea behind LRU – use past pattern of block references as a predictor
of future references
– LRU can be bad for some queries
• Queries have well-defined access patterns (such as sequential
scans), and a database system can use the information in a
user’s query to predict future references
• Mixed strategy with hints on replacement strategy provided
by the query optimizer is preferable
▪ Example of bad access pattern for LRU: when computing the join of 2
relations r and s by a nested loops
for each tuple tr of r do select *
for each tuple ts of s do from instructor natural join
if the tuples tr and ts match … department

24/2/2021 CS 245 25
Buffer-Replacement Policies (Cont.)
• Toss-immediate strategy – frees the space occupied
by a block as soon as the final tuple of that block has
been processed
• Most recently used (MRU) strategy – system must
pin the block currently being processed. After the
final tuple of that block has been processed, the
block is unpinned, and it becomes the most recently
used block.
• Buffer manager can use statistical information
regarding the probability that a request will
reference a particular relation
– E.g., the data dictionary is frequently accessed. Heuristic:
keep data-dictionary blocks in main memory buffer

24/2/2021 CS 245 26
Summary
• key question that we focused today
– How the DBMS represents the database in files on disk :
Data Storage Structures
• Overview of storage hierarchy
• Record layout
– Fixed size
– Variable size
• Page layout
– Slotted pages
• File organization
– Heap file organization
– Sequential file organization

23/2/2021 CS 245 27
Summary
• key question that we focused today
– How the DBMS manages its memory and move data back-
and-forth from disk: Data Storage Management
• Data Dictionary storage
• Buffer manager
• Buffer replacement policies
• Pinning and unpinning blocks
• Locks: shared and exclusive locks

Courtesy: some of the slides are take from lecture slides available at:
https://www.db-book.com/ and http://pages.cs.wisc.edu/~dbbook/index.html
24/2/2021 CS 245 28

You might also like