0% found this document useful (0 votes)
18 views

Disk Organization

The document discusses disk space management in a database management system (DBMS). It explains that the lowest layer of the DBMS manages space on disk and higher levels call on this layer to allocate, deallocate, read and write pages. It describes different methods for keeping track of free blocks on disk, including linked lists and bitmaps. It also covers buffer management in the DBMS, including maintaining a table to map frames to pages, page replacement policies, and the challenges of requesting pages.

Uploaded by

efi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Disk Organization

The document discusses disk space management in a database management system (DBMS). It explains that the lowest layer of the DBMS manages space on disk and higher levels call on this layer to allocate, deallocate, read and write pages. It describes different methods for keeping track of free blocks on disk, including linked lists and bitmaps. It also covers buffer management in the DBMS, including maintaining a table to map frames to pages, page replacement policies, and the challenges of requesting pages.

Uploaded by

efi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Disk Organization

1
2 Disk Space Management

 Lowest layer of DBMS software manages space on


disk.

 Higher levels call upon this layer to:


 allocate/de-allocate a page
 read/write a page

 Higher levels don’t need to know how this is done, or


how free space is managed.
3 Keeping Track of Free Blocks

 Linked List
 Bitmap
 A bitmap also allows very fast identification and
allocation of contiguous areas on disk.
4 Buffer Management in a
DBMS
 Data must be in RAM for DBMS to operate on it!
 Table of <frame#, pageid> pairs is maintained.
Page Requests from Higher Levels

BUFFER POOL

disk page

free frame

MAIN MEMORY

DISK choice of frame dictated


DB by replacement policy
5 More on Buffer Management

 Page in pool may be requested many times,


 a pin count is used (the number of current users of the
page)
 A page is a candidate for replacement if pin count = 0.

 A page is dirty, if its contents have been changed


after writing
 Buffer Manager keeps a dirty bit
 To remove P :
 If P is dirty, we write it to disk
 If P is not dirty, then what?
6 When a page is requested…

 A page is the unit of memory we request


 If Page in the pool
 Great no need to go to disk!
 increments the pin_count of that frame
 If not?
 If there is a free frame, use it!
 If not? We need to choose a page to remove!
 If frame is dirty, write it to disk
 Read requested page into chosen frame
 Pin the page and return its address.
 Frame is chosen for replacement by a replacement policy
 When the code releases the page, the pin_count is
decremented
7 Buffer Replacement Policy

 Frame is chosen for replacement by a replacement


policies:
 Least-recently-used (LRU),
 Clock, A variant of LRU,
 MRU
8 Least Recently Used (LRU)

 This can be implemented in the buffer manager using


a queue of pointers to frames with pin_count 0.
 A frame is added to the end of the queue when the
pin_count goes to 0.
 The page chosen for replacement is the one in the
frame at the head of the queue
 LRU is expensive (why ?)
9 The Clock Approximation

 Instead we maintain a “last used clock”


 Think of pages ordered 1…N around a clock
 uses current variable that takes on values 1 through N in
circular order.
 Pages keep a “ref bit”
 Whenever the page pin_count goes to 0 set the bit
 The current frame is considered for replacement
 If current page has ref bit == false choose it
 else, unset ref bit and move on
 “Approximates LRU” since a recently referenced
pages is less likely to be replaced.
10 MRU (Most Recently Used)

 The LRU and clock policies are not always the best
replacement strategies
 # buffer frames < # pages in file
 #buffer frames=10, # pages in file =11,
 means each page request causes an I/O.

 Why would you ever want to use MRU?


11 Files of Records

 FILE: A collection of pages, each containing a


collection of records.
 File must support:
 insert/delete/modify record
 read a particular record (specified using record id)
 scan all records (possibly with some conditions on the
records to be retrieved)
12 Unordered Files (Heap Files)

 Simplest file structure contains records in no


particular order.

 As file grows and shrinks, disk pages are allocated


and de-allocated.

 To support record level operations, we must:


 keep track of the pages in a file
 keep track of free space on pages
 keep track of the records on a page

 There are many alternatives for keeping track of


this.
13 Alternative 1:
Heap File Implemented as List

 Maintain a table containing pairs of:


<heap_file_name, head_page_address>
 Each page contains 2 `pointers’ (rid)
plus data.

Data Data Data Full Pages


Page Page Page
Header
Page
Data Data Data
Pages with
Page Page Page
Free Space
14 Heap File Implemented as a
List
 Insert a new page into heap file
 Disk manager adds a new free space page into link

 Delete a page from heap file


 Removed from the list
 Disk manager deallocates it

 Disadvantages:
 If records are of variable length, all pages will be in free list.
because it is likely that every page has at least a few free bytes
 Retrieve and examine several pages for enough
space.
15 Alternative 2: Heap File Using
Page Directory
 In directory, each entry
Data for a page includes
Header Page 1 number of free bytes
Page on page.
Data  The directory is a
Page 2 collection of pages
(linked list
implementation is just
one alternative).
Data  Much smaller than
DIRECTORY Page N linked list of all HF
pages!
16 Alternative 2: Heap File Using
a Page Directory
 Advantage of Page Directory :
 The size of directory is very small (much smaller than heap
file.)
 Searching space is very efficient, because find free
space without looking at actual heap data pages.
17 Page Formats

 Identify a record:
 Most cases, use <page_id, slot_number> as rid.

 How to arrange records in pages?


 Alternative approaches to manage slots on a
page
 How to support insert/deleting/searching?
18 Page Formats: Fixed Length
Records
 Alternative 1:
 Store records in the first N slots
 Whenever a record is deleted move the last record on
the page into the vacated slot.
 Advantage:
 Can locate the ith record on the page with simple offset
calculation.

 All empty slots appear at the end of the page.

 Disadvantage:
 Does not work if there are external references to the record that
moved
19 Page Formats: Fixed Length
Records
 Alternative 2:
 Handle deletion using array of bit
 One bit per slot to keep track of free slots
 When a page is deleted, its bit is turned off (i.e. 0).
 Locating records on the page requires scanning the bit
array to locate slots whose bit is on.
20 Page Formats: Fixed Length
Records
Record id = <page id, slot #>.

Note: In first alternative, moving records for free space


management changes rid; may not be acceptable if
existing external references to the record that is moved.

Slot 1 Slot 1
Slot 2 Slot 2
Free
... Space
...
Slot N Slot N

Slot M
N 1 . . . 0 1 1M
number
number M ... 3 2 1
of slots
PACKED of records UNPACKED, BITMAP
21 Page Formats: Variable
Length Records
 Cannot divide the page into fixed-length slots
 Challenges:
 When a new record is to be inserted, we have to find an
empty slot of just the right length.
 We must ensure that the free space on the page is
contiguous.
 So
 The ability to move records on a page becomes very
important
22 Page Formats: Variable
Length Records
 Directory of slots
 <record offset, record length> per slot
 record offset: offset in bytes from the start of the data
area
 Deletion: setting record offset to -1
 rid <page id, slot id> does not change when a record
moves.
23 Page Formats: Variable
Length Records
Rid = (i,N) Length = 20 Offset of
Page i record
from start
Rid = (i,2) Length = 16 of data
area
Rid = (i,1) Length = 24

20 16 24 N Pointer
N ... 2 1 # slots to start
of free
space
SLOT DIRECTORY

Slot directory = {<record_offset, record_length>}


24 Page Formats: Variable
Length Records
 Maintain a pointer to the start of the free space area
 When a new record does not fit into the remaining
free space
 move records in the page to reclaim space deleted
earlier.
 Cannot always remove a slot of a deleted record (or
the rid of the other slots will change).
 When a new record is inserted, the directory is
scanned for an element not pointing to a record.
25 Page Formats: Variable
Length Records
 Slot directory = {<record_offset, record_length>}
 Advantages:
+ Moving: rid is not changed
+ Deletion: offset = -1 (rid changed?
Can we delete slot? Why?)
+ Insertion: Reuse deleted slot.
Only insert if none available.
26 Records Formats: Fixed
Length Record
 Information about field types same for all
records in a file
 Stored record format in system catalogs.
 Finding i’th field does not require scan of
record, just offset calculation.

F1 F2 F3 F4

L1 L2 L3 L4

Base address (B) Address = B+L1+L2


27 Record Formats: Variable
Length
 Alternative 1:
 Store fields consecutively
 Separate fields by delimiters
 Disadvantage: requires the scan of the whole record to
reach a field
 Alternative 2:
 Reserve space at the beginning of the record for use as
an array of integer offsets
 The ith entry is the starting address of the ith field relative
to the start of the record.
28 Record Formats: Variable
Length
 Two alternative formats (# fields is fixed):
F1 F2 F3 F4

4 $ $ $ $

Field
Fields Delimited by Special Symbols
Count
F1 F2 F3 F4

Array of Field Offsets


+ Second offers direct access to i’th field
+ efficient storage of nulls ;
- small directory overhead.
29 Issues in Variable Length
Records
 Modifying a field may cause it to grow
 Need to shift subsequent fields to make room.
 A modified record may no longer fit in the available
space on its page, May need to move to another
page
 rid will change → causing problems
 We must leave a forwarding address in the old place
 A record may grow so large that it cannot fit on any
page
 Break a record into smaller records
 Chain them together

You might also like