Lecture16 Fall

The document discusses database storage and relational operators. It covers buffer management, selection operations using indexes and scans, and different join algorithms including nested loop, hash, and sort merge joins.

Uploaded by

Faruk Karagoz

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Lecture16 Fall

Uploaded by

Faruk Karagoz

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

CSE 412 Database Management

Lecture 16 Database Storage (Cont.)

and Relational Operators
Jia Zou
Arizona State University

1
Can we leverage OS for DB storage
management?

OS virtual memory
OS file system
Can we leverage OS for DB storage
management?
• Unfortunately, OS often gets in the way of DBMS
• DBMS needs to do things “its own way”
• Control over buffer replacement policy
• LRU not always best (some times worst!)
• Control over flushing data to disk
• Write-ahead logging (WAL) protocol requires flushing log entries to disk
Today’s Agenda
• Buffer management
• Relational operators
Organize Disk Space into Pages
• A table is stored as one or more files, a file contains one or more
pages
• Higher levels call upon this layer to:
• allocate/de-allocate a page
• read/write a page
• Best if requested pages are stored sequentially on disk! Higher levels
don’t need to know if/ how this is done, nor how free space is
managed.
Buffer Management
Pinned or
Unpinned
Buffer Management
• Data must be in RAM for DBMS to operate on it!
• Buffer Mgr hides the fact that not all data is in RAM
When a Page is Requested ...
• Buffer pool information table contains: NOT FOUND <?,?,?>
• If requested page is not in pool and the pool is not full:
• Read requested page into chosen frame
• Pin the page and return its address
• If requested page is not in pool and the pool is full:
• Choose an (un-pinned) frame for replacement
• If frame is “dirty”, write it to disk
• Read requested page into chosen frame
• Pin the page and return its address
• Buffer pool information table now contains:

• Unpin it when you finish using the page

Buffer Replacement Policy
• Frame is chosen for replacement by a replacement policy:
• Least-recently-used (LRU), MRU, Clock, etc.
• Policy -> big impact on # of I/O ’s; depends on the access pattern
LRU Replacement Policy
• Least Recently Used (LRU)
• for each page in buffer pool, keep track of time last unpinned
• replace the frame which has the oldest (earliest) time
• very common policy: intuitive and simple
• Problems?
LRU Replacement Policy
• Problem: Sequential Flooding
• LRU + repeated sequential scans.
• # buffer frames < # pages in file means each page request causes an I/O. MRU
much better in this situation (but not in all situations, of course).
Sequential Flooding – Illustration
How LRU work?
How LRU work?
How LRU work?
How will MRU Work?
How will MRU work?
How will MRU work?
How will MRU work?
How will MRU work?
Advanced Paging Algorithm
• Greedy-dual
• Locality Set
• Clock
Summary
• Buffer manager brings pages into RAM.
• Very important for performance
• Page stays in RAM until released by requestor.
• Written to disk when frame chosen for replacement (which is sometime after
requestor releases the page).
• Choice of frame to replace based on replacement policy.
Conclusions
• Memory hierarchy
• Disks: (>1000x slower) thus
• pack info in blocks
• try to fetch nearby blocks (sequentially)
• Buffer management: very important
• LRU, MRU, etc
• Record organization: Slotted page
Today’s Agenda
• Buffer management
• Relational operators
Operator Algorithms
• Selection
• Join

25
Operator Algorithms
• Selection
• Join

26
Operator Algorithms
• Selection
• Join

What is the best way to implement the sele

A: It depends on
• availability of the indexes
27
Operator Algorithms
• Selection
• Join

28
Selection Options
• No Index, Unsorted Data

29
Selection Options
• No Index, Unsorted Data => File Scan (Linear Search)

30
Selection Options
• No Index, Unsorted Data => File Scan (Linear Search)
• No Index, Sorted Data =>

31
Selection Options
• No Index, Unsorted Data => File Scan (Linear Search)
• No Index, Sorted Data => File Scan (Binary Search)

32
Selection Options
• No Index, Unsorted Data => File Scan (Linear Search)
• No Index, Sorted Data => File Scan (Binary Search)
• B+ Tree Index/Hashing Index => Use index to find qualifying data
entries, then retrieve corresponding data records

33
Operator Algorithms
• Selection
• Join

34
Join Algorithms
• Nested Loop Join
• Grace Hash Join
• Sort Merge Join

35
Join Algorithms
• Nested Loop Join
• Grace Hash Join
• Sort Merge Join

36
Nested Loop Join

37
Nested Loop Join

38
Nested Loop Join

pR is the number of records in each page.

39
Block Nested Loop Join

40
Block Nested Loop Join

41
Block Nested Loop Join

42
Block Nested Loop Join

43
Animation

44
Join Algorithms
• Nested Loop Join
• Grace Hash Join
• Sort Merge Join

45
Grace Hash Join
• Two-Phase Hash join:
• Partition Phase: Hash both tables on the join attribute into partitions.
• Probing Phase: Compares tuples in corresponding partitions for each table.
• Named after the GRACE database machine.
Grace Hash Join
• Hash R into (0, 1, ..., ‘max’) buckets
• Hash S into buckets (same hash function)
Grace Hash Join
• Join each pair of matching buckets:
Grace Hash Join: Partition

Buffers
R S Partition 1

1 Buffer