0% found this document useful (0 votes)
15 views

5b Tree Indexes

Found key 27 in leaf node Page 8 Fetch data (record) pointed to by 27's pointer

Uploaded by

abir Hal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

5b Tree Indexes

Found key 27 in leaf node Page 8 Fetch data (record) pointed to by 27's pointer

Uploaded by

abir Hal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Tree Indexes

Alvin Cheung
Fall 2022
Reading: R & G Chapter 10
Input 3, 4, 5 1, 2, 7 8, 6, 9 10, _, _
Simple Idea? Heap
File
• Step 1: Sort heap file & leave some space
• Pages physically stored in logical order (sequential access)
• Maintenance as new records are added/deleted is a pain, can lead
to B updates in the worst case (move everything down or up)

1, 2, _ 3, 4, _ 5, 6, _ 7, 8, _ 9, 10, _

• Step 2: Use binary search on this sorted heap file: log2(B) pages read
• Fan-out of 2 à deep tree à lots of I/Os
• Examine entire records just to read key during search: would
prefer log2(K) where K is number of pages to store keys << B
Let’s fix these assumptions
• Idea: Keep separate (compact) key lookup pages, laid out sequentially
• Maintain key è recordID mapping [We’ll revisit this later]
• No need to sort heap file anymore! Just sort key lookup pages
• Use binary search on lookup pages as opposed to on all of the data pages
• Still have a deep tree due to fan-out of 2 à lots of I/Os
• Also, maintenance of the key lookup pages is a pain! Worst case K updates

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

ot d,
Sl ge I
Id
Pa
Page 1 Page 2 Page 3 Page 4

(20, Tim) (7, Dan) (5, Kay) (3, Jim) (27, Joe) (34, Kit) (1, Kim) (42, Hal)
Let’s fix these assumptions, take 2
• Idea: repeat the process!
• Lookup pages for the lookup
pages 17
Page 1
• And then lookup pages for
the lookup pages for the Page 4 5 13 24 30 Page 6
lookup pages, ….
• Let’s set fanout to be >> 2
• That is essentially the idea
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
behind B+ Trees …
• We’ll find out why the
pointers are helpful later

ot d,
Sl ge I
Id
Pa
Page 1 Page 2 Page 3 Page 4

(20, Tim) (7, Dan) (5, Kay) (3, Jim) (27, Joe) (34, Kit) (1, Kim) (42, Hal)
Enter the B+ Tree, More Formally

• Dynamic Tree Index


• Always Balanced
• High fanout
• Support efficient insertion & deletion
• Grows at root not leaves!
• “+”? B-tree that stores data entries in leaves only
• Helps with range search
B+ Trees: How to Read an Interior Node

Values &
• Node[…, (KL, PL), (KR, PR)…] Values v:
Pointers
5<=v<13
means that 0 5 13

All tuples in range


KL <= K < KR are in tree PL 2* 3* 5* 7* 8* 14* 16*
Example of a B+ Tree
Root Node 17
Values
Page 1 v:
24<=v<30
Page 4 5 13 24 30 Page 6

Leaf Node Key à Pointer to record

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Page 2 Page 3 Page 5 Page 7 Page 8 Page 9

• Property 1: Nodes in a B+ tree must obey an occupancy invariant


• Guarantees that lookup costs are bounded
• Invariant: each interior node is full beyond a certain minimum: in this case [and typically], at least half full
• This minimum, d, is called the order of the tree
• Here, max # of entries = 4. Thus d = 2.
• Guarantee: d <= # entries <= 2d. In this tree, 2 <= # entries <= 4
• Root doesn’t need to obey this invariant
• Same invariant holds for leaf nodes: at least half full (d may differ, here it is the same)
Example of a B+ Tree
Root Node 17 Values v:
Page 1
24<=v<30
Page 4 5 13 24 30 Page 6

Leaf Node Key à Pointer to record

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Page 2 Page 3 Page 5 Page 7 Page 8 Page 9

• Property 1: Nodes in a B+ tree must obey an occupancy invariant


• Each interior/leaf node is full beyond a certain minimum d
• Property 2: Leaves need not be stored in sorted order (but often are)
• Next and prev. pointers help examining them in sequence [useful as we will see soon]
B+ Trees and Scale

Root Node
Key à Pointer to record

• How many records can this height 1 B+ tree index?


• Max entries = 4; Fan-out (# of pointers) = 5
• Height 1: 5 (pointers from root) x 4 (slots in leaves) = 20 Records
B+ Trees and Scale Part 2
Root Node

Level 2

Level 3

• How many records can this height 3 B+ tree index?


• Fan-out = 5; Max entries = 4
• Height 3: 5 (root) x 5 (level 2) x 5 (level 3) x 4 (leaves) = 53 x 4 = 500 Records
Extending this: B+ Trees in Practice
• (Warning: Sloppy back-of-the-envelope calculation!)
• Say 128KB pages, with around 40B per (val, ptr) pair
• Max entries = roughly 128KB/40B = approx. 3000 2000 2000
• Max fanout = 3000+1 = approx. 3000 2000
• Say 2/3 are filled on average 2000
• Average fan-out/entries = approx. 2000 2000

• At these capacities
• Height 1: 2000 (pointers from root) x 2000 (entries per leaf) = 20002 = 4,000,000
• Height 2: 2000 (pointers from root) x 2000 (pointers from level 2) x 2000 (entries
per leaf) = 20003 = 8,000,000,000 records!!
• Core takeaway: Even depths of 3 allow us to index a massive # of records!
Searching the B+ Tree
Root Node 17
Page 1

Page 4 5 13 24 30 Page 6

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Page 2 Page 3 Page 5 Page 7 Page 8 Page 9

• Procedure:
• Find split on each node (Binary Search)
• Follow pointer to next node
Searching the B+ Tree: Find 27
Root Node 17
Page 1

Page 4 5 13 24 30 Page 6

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Page 2 Page 3 Page 5 Page 7 Page 8 Page 9

• Find key = 27
• Find split on each node (Binary Search)
• Follow pointer to next node
Searching the B+ Tree: Fetch Data
Root Node 17
Page 1

Page 4 5 13 24 30 Page 6

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Page 2 Page 3 Page 5 Page 7 Page 8 Page 9

ot d,
Sl ge I
Id
Pa

Page 1 Page 2 Page 3 Page 4

(20, Tim) (7, Dan) (5, Kay) (3, Jim) (27, Joe) (34, Kit) (1, Kim) (42, Hal)
Searching the B+ Tree: Find 27 and up
Root Node 17
Page 1

Page 4 5 13 24 30 Page 6

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Page 2 Page 3 Page 5 Page 7 Page 8 Page 9

• Find keys >=27


• Find 27 first, then traverse leaves following “next” pointers in leaf
• This is an example of a range scan: find all values in [a, b]
• Benefit: no need to go back up the tree! Saves I/Os
Inserting 26* into a B+ Tree Part 1
26*

Root Node 13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 24* 25* 29* 33* 34* 38* 39*

• Find the correct leaf


Inserting 26* into a B+ Tree Part 2

Root Node 13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 24* 25* 29* 26* 33* 34* 38* 39*

• Find the correct leaf


• If there is room in the leaf just add the entry
Inserting 26* into a B+ Tree Part 3

Root Node 13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 24* 25* 26* 29* 33* 34* 38* 39*

• Find the correct leaf


• If there is room in the leaf just add the entry
• Sort the leaf page by key
Inserting 8* into a B+ Tree: Find Leaf
8*

Root Node 13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*

• Find the correct leaf


Inserting 8* into a B+ Tree: Insert

Root Node 13 17 24 30

8*

2* 3* 5* 7* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*

• Find the correct leaf


• Split leaf if there is not enough room
Inserting 8* into a B+ Tree: Split Leaf

Root Node 13 17 24 30

8*

2* 3* 5* 7* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*

• Find the correct leaf


• Split leaf if there is not enough room
• Redistribute entries evenly
Inserting 8* into a B+ Tree: Split Leaf, cont

Root Node 13 17 24 30

2* 3* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*

5* 7* 8*

• Find the correct leaf


• Split leaf if there is not enough room
• Redistribute entries evenly
• Fix next/prev pointers
Inserting 8* into a B+ Tree: Fix Pointers

Root Node 13 17 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*

• Find the correct leaf


• Split leaf if there is not enough room
• Redistribute entries evenly
• Fix next/prev pointers
Inserting 8* into a B+ Tree: Mid-Flight

I am an
Root Node 13 17 24 30
orphan!

2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*

• Something is still wrong!


Inserting 8* into a B+ Tree: Copy Middle Key

5 Root Node 13 17 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*

• Copy up from leaf the middle key and pointer to the orphan leaf
• This is what we need to access it
Inserting 8* into a B+ Tree: Split Parent, Part 1

5 13 17 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*

• Copy up from leaf the middle key and pointer to the orphan leaf
• No room in parent? (Parent now has 2d+1 instead of 2d)
• Recursively split index nodes
• Redistribute the rightmost d+1 keys
Inserting 8* into a B+ Tree: Split Parent, Part 2

5 13 17 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*

• Copy up from leaf the middle key and pointer to the orphan leaf
• No room in parent? Recursively split index nodes
• Redistribute the rightmost d+1 keys
• Not enough: we now have two roots!
Inserting 8* into a B+ Tree: Root Grows Up
Root Node

5 13 17 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*

• No room in parent? Recursively split index nodes


• Redistribute the rightmost d+1 keys
• To fix, create a new root:
• Push up from interior node the middle key (and assoc. pointer)
Inserting 8* into a B+ Tree: Root Grows Up, Pt 2
17 Root Node

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*

• Net effect
• d keys on the left and right => invariant satisfied!
• middle key pushed up
• Consolidate 5* into left node
Inserting 8* into a B+ Tree: Root Grows Up, Pt 3
17 Root Node

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*

• Net effect
• d keys on the left and right
• middle key pushed up
• Here, we ended up creating a new root and increasing depth => rare
Copy up vs Push up!
17 Root Node

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*

The leaf entry (5) was copied up


We can’t lose the original key: all keys must be in leaves
The index entry (17) was pushed up
We don’t need it any more for routing => convince yourself!
B+ Tree Insert: Algorithm Sketch
1. Find the correct leaf L.

2. Put data entry onto L.


• If L has enough space, done!
• Else, must split L (into L and a new node L2)
• Redistribute entries evenly, copy up middle key (and ptr to L2)
• Insert index entry pointing to L2 into parent of L.
B+ Tree Insert: Algorithm Sketch Part 2
• Step 2 can happen recursively
• To split index node, redistribute entries evenly, but push up middle
key (and ptr to new index node). (Contrast with leaf splits)

• Splits “grow” tree


• Tree growth: gets wider if possible from bottom up
• Worst case, adds another level with a new root
• Ensures balance & therefore the logarithmic guarantee
Before 17 Root Node
After

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
We will skip deletion
• In practice, occupancy invariant often not enforced during deletion
• Just delete leaf entries and leave space
• If new inserts come, great
• This is common

• If page becomes completely empty, can delete


• Parent may become underfull
• That’s OK too

• No need to delete inner pages even if empty


• Only used for lookups

• Guarantees still attractive: logF(total number of inserts)

• Textbook describes algorithm for rebalancing and merging on deletes


BULK LOADING B+-TREES
Bulk Loading of B+ Tree
• Suppose we want to build an index on a large table from scratch
• Would it be efficient to just call insert repeatedly
• Q: No … Why not?
• Constantly need to search from root
• Modifying random pages: poor cache efficiency
• Leaves poorly utilized (typically half-empty)
Smarter Bulk Loading a B+ Tree

4 7 10 13

1* 2* 3* 4* 5* 6* 7* 8* 9* 10* 11* 12* 13* 14* 15* 16


16* 17* 18*

• Sort the input records by key:


• 1*, 2*, 3*, 4*, …
• We’ll learn a good disk-based sort algorithm soon!
• Fill leaf pages to some fill factor > d (e.g. ¾)
• Updating parent until full
Smarter Bulk Loading a B+ Tree Part 2

10

4 7 13 16

1* 2* 3* 4* 5* 6* 7* 8* 9* 10* 11* 12* 13* 14* 15* 16* 17* 18*

• Sort the input records by key:


• 1*, 2*, 3*, 4*, …
• Fill leaf pages to some fill factor > d (e.g. ¾)
• Update parent until full
• Then create new sibling and copy over half: same as in index node splits for insertion
Smarter Bulk Loading a B+ Tree Part 3
Never Touched Again
10

4 7 13 16

1* 2* 3* 4* 5* 6* 7* 8* 9* 10* 11* 12* 13* 14* 15* 16* 17* 18*

• Lower left part of the tree is never touched again


• Occupancy invariant maintained:
• leaves filled beyond d, rest of the nodes via insertion split procedure
Smarter Bulk Loading a B+ Tree Part 4

10

4 7 13 16 19 22

1* 2* 3* 4* 5* 6* 7* 8* 9* 10* 11* 12* 13* 14* 15* 16* 17* 18* 19* 20* 21* 22* 23* 24*

• Benefits: Better
• Cache utilization than insertion into random locations
• Utilization of leaf nodes (and therefore shallower tree)
• Layout of leaf pages (more sequential)
Summary
• B+ Tree is a powerful dynamic indexing structure
• Inserts/deletes leave tree height-balanced; logFN cost
• High fanout (F) means height rarely more than 3 or 4.
• Higher levels stay in cache, avoiding expensive disk I/O
• Almost always better than maintaining a sorted file.
• Widely used in DBMSs!

• Bulk loading can be much faster than repeated inserts for creating a
B+ tree on a large data set.

You might also like