Lecture16 Fall
Lecture16 Fall
1
Can we leverage OS for DB storage
management?
OS virtual memory
OS file system
Can we leverage OS for DB storage
management?
• Unfortunately, OS often gets in the way of DBMS
• DBMS needs to do things “its own way”
• Control over buffer replacement policy
• LRU not always best (some times worst!)
• Control over flushing data to disk
• Write-ahead logging (WAL) protocol requires flushing log entries to disk
Today’s Agenda
• Buffer management
• Relational operators
Organize Disk Space into Pages
• A table is stored as one or more files, a file contains one or more
pages
• Higher levels call upon this layer to:
• allocate/de-allocate a page
• read/write a page
• Best if requested pages are stored sequentially on disk! Higher levels
don’t need to know if/ how this is done, nor how free space is
managed.
Buffer Management
Pinned or
Unpinned
Buffer Management
• Data must be in RAM for DBMS to operate on it!
• Buffer Mgr hides the fact that not all data is in RAM
When a Page is Requested ...
• Buffer pool information table contains: NOT FOUND <?,?,?>
• If requested page is not in pool and the pool is not full:
• Read requested page into chosen frame
• Pin the page and return its address
• If requested page is not in pool and the pool is full:
• Choose an (un-pinned) frame for replacement
• If frame is “dirty”, write it to disk
• Read requested page into chosen frame
• Pin the page and return its address
• Buffer pool information table now contains:
25
Operator Algorithms
• Selection
• Join
26
Operator Algorithms
• Selection
• Join
28
Selection Options
• No Index, Unsorted Data
29
Selection Options
• No Index, Unsorted Data => File Scan (Linear Search)
30
Selection Options
• No Index, Unsorted Data => File Scan (Linear Search)
• No Index, Sorted Data =>
31
Selection Options
• No Index, Unsorted Data => File Scan (Linear Search)
• No Index, Sorted Data => File Scan (Binary Search)
32
Selection Options
• No Index, Unsorted Data => File Scan (Linear Search)
• No Index, Sorted Data => File Scan (Binary Search)
• B+ Tree Index/Hashing Index => Use index to find qualifying data
entries, then retrieve corresponding data records
33
Operator Algorithms
• Selection
• Join
34
Join Algorithms
• Nested Loop Join
• Grace Hash Join
• Sort Merge Join
35
Join Algorithms
• Nested Loop Join
• Grace Hash Join
• Sort Merge Join
36
Nested Loop Join
37
Nested Loop Join
38
Nested Loop Join
39
Block Nested Loop Join
40
Block Nested Loop Join
41
Block Nested Loop Join
42
Block Nested Loop Join
43
Animation
44
Join Algorithms
• Nested Loop Join
• Grace Hash Join
• Sort Merge Join
45
Grace Hash Join
• Two-Phase Hash join:
• Partition Phase: Hash both tables on the join attribute into partitions.
• Probing Phase: Compares tuples in corresponding partitions for each table.
• Named after the GRACE database machine.
Grace Hash Join
• Hash R into (0, 1, ..., ‘max’) buckets
• Hash S into buckets (same hash function)
Grace Hash Join
• Join each pair of matching buckets:
Grace Hash Join: Partition
Buffers
R S Partition 1
1 Buffer
Partition 2
Grace Hash Join: Partition
Buffers
R S Partition 1
1 Buffer
Partition 2
Grace Hash Join: Partition
Buffers
R S Partition 1
1 Buffer
Partition 2
Grace Hash Join: Partition
Buffers
R S Partition 1
1 Buffer
Partition 2
Grace Hash Join: Partition
Buffers
R S Partition 1
1 Buffer
Partition 2
Grace Hash Join: Partition
Buffers
R S Partition 1
1 Buffer
Partition 2
Grace Hash Join: Partition
Buffers
R S Partition 1
1 Buffer
Partition 2
Grace Hash Join: Partition
Buffers
R S Partition 1
1 Buffer
Partition 2
Grace Hash Join: Partition
Buffers
R S Partition 1
1 Buffer
Partition 2
Grace Hash Join: Partition
Buffers
R S Partition 1
1 Buffer
Partition 2
Grace Hash Join: Partition
Buffers
R S Partition 1
1 Buffer
Partition 2
Grace Hash Join: Partition
Buffers
R S Partition 1
1 Buffer
Partition 2
Grace Hash Join: Partition
Buffers
R S Partition 1
1 Buffer
Partition 2
Grace Hash Join: Build & Probe
Partition 1
Partition 1
Partition 1
Partition 1
67
68
69
70
71
72
Sorting Algorithm
• Used not only for Sort-Merge Join, but also for
• ORDER BY
• DISTINCT
• Sort-based Aggregate
73
Overview of the Sorting Problem
• Files are broken up into N pages.
• The DBMS has a finite number of B fixed size buffers.
• Let’s start with a simple example...
74
Two-way External Merge Sort
75
Two-way External Merge Sort
• # of passes: log2 N
77
• # of passes: logB N
General External Merge Sort • Total I/O cost: 2N logB N=
2N log N/log B
78
Using B+ Tree for Sorting
79
Clustered B+ Tree for Sorting