0% found this document useful (0 votes)
1 views

DSA2 Chapter 4 Trees

Chapter 4 of the document discusses trees as a data structure, particularly focusing on binary search trees (BSTs) and their properties, including average and worst-case time complexities for various operations. It covers the formal definition of trees, their terminology, implementation strategies, and applications in file systems and expression evaluation. The chapter also explores different types of binary trees, traversal strategies, and operations such as searching, inserting, and deleting nodes in a BST.

Uploaded by

remadilyna20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

DSA2 Chapter 4 Trees

Chapter 4 of the document discusses trees as a data structure, particularly focusing on binary search trees (BSTs) and their properties, including average and worst-case time complexities for various operations. It covers the formal definition of trees, their terminology, implementation strategies, and applications in file systems and expression evaluation. The chapter also explores different types of binary trees, traversal strategies, and operations such as searching, inserting, and deleting nodes in a BST.

Uploaded by

remadilyna20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Data Structures and Algorithms 2

Prof. Ahmed Guessoum


The National Higher School of AI

Chapter 4
Trees
From Linear ADTs to …
• For input of large size, the linear access time of
linked lists is prohibitive.
• In this chapter, we look at a simple data structure
for which the average running time of most
operations is O(log n), and some simple
modification to get O(log n) in the worst case.
•  Binary Search Trees
• Trees are very useful abstractions in computer
science
• We will discuss their use in other, more general
applications. 2
Aims of this chapter
• We will also see how trees are used to:
– implement the file system of several popular
OSs.
– evaluate arithmetic expressions.
– support search operations in O(log n) average
time
– refine these ideas to obtain O(log n) worst-
case bounds.
– implement these operations when the data are
stored on a disk. 3
Formal Definition
A tree is a sequence of nodes.
There is a starting node known as root node.
Every node other than the root has a parent node.
The nodes may have any number of children,
themselves being roots of trees.
A node that has no children is a leaf

4
Illustration of a tree + terminology

A root; B, H, Q: leaves; I, J children of E; F parent of K,


L, M; K, L, M: siblings; E grandparent of P; Q
5
grandchild of E
More terminology
• A path from node n1 to nk is defined as a sequence of
nodes n1, n2, . . . , nk such that ni is the parent of ni+1 for
1 ≤ i < k.
• The length of this path is the number of edges on the
path, namely, k − 1.
• There is a path of length zero from every node to itself.
• In a tree there is exactly one path from the root to any
node.
• For any node ni, the depth of ni is the length of the
unique path from the root to ni.
 the root is at depth 0.
6
More terminology
• The height of ni is the length of the longest path
from ni to a leaf.
 all leaves are at height 0.
 height (tree) = height (root)
• If there is a path from n1 to n2, then n1 is an
ancestor of n2 and n2 is a descendant of n1.
• If n1 ≠ n2, then n1 is a proper ancestor of n2 and
n2 is a proper descendant of n1.
7
Implementation of Trees
• One intuitive approach: have in each node, besides its
data, a link to each child of the node
• Can be very costly: number of children per node can
vary greatly and is not known in advance
• Simple solution : Keep the children of each node
in a linked list of tree nodes.
struct TreeNode
{
Object element;
TreeNode *firstChild;
TreeNode *nextSibling;
8
};
First child/next sibling representation of a tree

• Arrows that point downward are firstChild links.


• Arrows that go left to right are nextSibling links.
• Null links are not drawn (too many).
• Ex.: node E has both a link to a sibling (F) and a link to 9a
child (I); some nodes have neither
Tree Traversal with Applications
Application: File system on Linux (or DOS)

/usr is the root directory


Filename “/usr/mark/book/ch1.r” is obtained by following the
leftmost child 3 times.
Each “/” after the first indicates an edge;
10
The result is the full pathname
Pseudocode to list a directory in a
hierarchical file system
void FileSystem::listAll( int depth = 0 ) const
{ // Preorder traversal to print directory filenames
1 printName( depth ); // Print the name of the object
2 if( isDirectory( ) )
3 for each file c in this directory
4 c.listAll( depth + 1 );
}
• Prints all the names of files in the directory
• files at depth di will have their names indented by di tabs
(function starts with depth = 0  no indent for root)
• Any children are one level deeper, and thus need to be
11
indented an extra tab with respect to their parent.
12
Traversal strategies
• Preorder traversal: process
Root BEFORE processing void FileSystem::listAll( int depth = 0 ) const
{ // Preorder traversal to print directory
children from left to right filenames
1 printName( depth ); // Print object name
(recursively) 2 if( isDirectory( ) )
3 for each file c in this directory
• Analysis: 4
}
c.listAll( depth + 1 );

– Line 1 executed exactly once


per node; same for line 2
– Line 4 executed at most once
for each child of each node
– The loop (Line 3) is iterated at
most N times
 O(N) where N is the number
13
of file names in the directory
Traversal strategies
• Postorder traversal: process Root AFTER
processing children from left to right (recursively)
• Example:
– Having a file directory with information about the
number of disk blocks used in memory by each file
– we would like to calculate the total number of blocks
used by all the files in the tree
– Compute the total number of blocks by adding up the
numbers for each file/directory

14
int FileSystem::size( ) const
{ // Postorder traversal: total size of directory files
int totalSize = sizeOfThisFile( );
if( isDirectory( ) )
for each file c in this directory
totalSize += c.size( );
return totalSize;
15
}
16
Binary Trees
• A Binary Tree is a tree in which each node can have at
most two children.
– The depth of an average binary tree is generally
considerably smaller than N.
– An analysis shows that
• the average depth of a binary tree is O(√N)
(Exercise)
• for a special type of binary tree, namely the binary
search tree, average value of the depth is O(log N).
– Unfortunately, the depth can be as large as N − 1 (Worst
case: each node has exactly one child except leaf)
17
Different Types of BTs
• A full binary tree (sometimes
proper binary tree or 2-tree) is
a tree in which every node
other than the leaves has two
children
• A complete binary tree is a
binary tree in which every
level, except possibly the
last, is completely filled,
and all nodes are as far left
as possible.

• Binary Search Trees will be presented later in this chapter


Complexity Analysis of CBT
A complete binary tree of N nodes has depth
O(log N)
Prove by induction that the number of nodes at
depth d is 2d
Total number of nodes of a complete binary tree
of depth d is 1 + 2 + 4 +……+ 2d = 2d+1 - 1
Thus 2d+1 - 1 = N
d = log(N + 1) - 1
19
Implementation of a BT

struct BinaryTreeNode
{
Object element; // data in the node
BinaryTreeNode *left; // Left child
BinaryTreeNode *right; // Right child
};

• Various interesting uses of BTs


• Next example in the area of Compiler Design

20
Expression Trees
• The leaves of an expression tree are operands,
such as constants or variable names;
• The other nodes contain operators
• An ET is not necessarily binary:
– e.g. case of unary operators (- and +)
– Nodes may have more than 2 children: e.g. ternary
operators.

21
Inorder/Postorder/Preorder Traversal
• Inorder traversal strategy: (left, node, right)
– Produce an overly paranthesised expression by
o recursively processing a parenthesized left expression
o then printing out the operator at the root, and finally
o recursively processing a parenthesized right expression.
(a + (b * c)) + (((d * e) + f) * g)
• Postorder traversal strategy: (left subtree,
right subtree, operator)
abc*+de*f+g*+ (postfix notation of Chapter 3)
• Preorder traversal strategy: (operator, left subtree,
right subtree)
++a*bc*+*defg (prefix notation) 22
Constructing an ET
Algorithm to convert a postfix expression into an
expression tree:
• Read the expression one symbol at a time.
• If the symbol is an operand, create a one-node tree and
push a pointer to it onto a stack.
• If the symbol is an operator, pop (pointers) to two
trees T1 and T2 from the stack (T1 is popped first) and
form a new tree whose root is the operator and whose
left and right children point to T2 and T1, respectively.
• A pointer to this new tree is then pushed onto the
stack.
23
Example: input a b + c d e + * *
First two symbols are operands, so we create a one-node tree for
each of them and push pointers to them onto a stack

Next, + is read, so two pointers to trees are popped, a new tree is


formed, and a pointer to it is pushed onto the stack.

24
Next, c, d, and e are read, and for each, a one-node tree is
created and a pointer to the corresponding tree is pushed onto
the stack.

Now + is read, so two trees are merged.

25
* is read, so we pop two tree pointers and form a new
tree with a * as root.

26
Finally, the last symbol * is read, the two trees are
merged, and a pointer to the final tree is left on the
stack.

27
Binary Trees
• Important application of binary trees is their use
in searching
• We will assume a tree of integers, though
arbitrarily complex (nodes) elements are
possible
• We will also assume that all the items are
distinct (duplicates dealt with later)

28
Section 1

29
Binary Search Tree
• Binary search tree (BST): a BT where every
node in the left subtree is less than the root,
and every node in the right subtree is larger
than the root.
• Properties of a BST are recursive
• Examples: Are the following BSTs?

A BST Not a BST


30
Operations on BSTs
• Implemented recursively
• Average depth of a binary search tree is O(log N)
 no need to worry in general about running out of
stack space
• The data member is a pointer to the root node; this
pointer is nullptr for empty trees.

Code for Binary Search Tree Interface

31
Refresher: lvalue vs rvalue
std::vector<int> createVector() {
std::vector<int> v{ 1, 2, 3, 4, 5 };
return v;
}
int main() {
std::vector<int> v1 = createVector();
// copy constructor
std::vector<int>&& v2 = createVector();
// move constructor
return 0;
32
}
Refresher: lvalue vs rvalue
• Function createVector() returns a vector of integers.
• In main(), we call createVector() twice: once to
initialise v1 and once to initialise v2.
• When v1 is initialised, copy constructor is called:
– creates a new vector,
– copies contents of vector returned by createVector() into
it.
• When v2 is initialised, move constructor is called:
– moves contents of vector returned by createVector() in
to v2.
– Since original vector not needed anymore, this is more
efficient than copying it. 33
Searching an element in a BST
Start from the root.
Each time we encounter a node, see if the key
in the node equals the element. If yes stop.
If the element is less, go to the left subtree.
If it is more, go to the right subtree.
Conclude that the element is not in the list if
we reach a leaf node and the key in the node
does not equal the element.
41
Search(node, elt)
{
If (node = NULL) conclude NOT FOUND;
Else If (node.key = elt) conclude FOUND;
Else If (elt < node.key) Search(node.leftchild, elt);
Else If (elt > node.key) Search(node.rightchild, elt);
}

Complexity: O(d), d is the depth of the element


being searched for
For complete binary search trees: O(log N) where
N is the number of nodes 42
5

3 8
Search for 10
Sequence
4 10
1 Traveled:
5, 8, 10
Search for 3.5 Found!
Sequence Traveled:
5, 3, 4
Not found! 43
Find Min
• Returns a pointer to the node containing the
smallest element in the tree
• Start at the root and
– go left as long as there is a left child.
– The stopping point is the smallest element
BinaryNode * findMin( BinaryNode *t ) const { // recursive
if( t == nullptr )
return nullptr;
if( t->left == nullptr )
return t;
return findMin( t->left ); }

Complexity: O(d) 44
Find Min
// non-recursive version
BinaryNode * findMin( BinaryNode *t ) const
{
if( t != nullptr )
while( t->left != nullptr )
t = t->left;
return t;
}

45
5

3 8

1 4 10

Travel 5, 3, 1
Return 1;

46
Insert an element

Try to find the element;


If the element exists, do nothing.
If it does not, insert it at the position of the
returned null pointer;

47
Insertion function
void insert(const Comparable & x, BinaryNode * & t){
if( t == nullptr )
t = new BinaryNode{ x, nullptr, nullptr };
else if( x < t->element )
insert( x, t->left );
else if( t->element < x )
insert( x, t->right );
else
; // Duplicate; do nothing
}
Complexity: O(d) 48
5
Insert 3.5
3 8
Sequence
Traveled:
1 4 10
5, 3, 4
5
Insert 3.5 as left
child of 4
3 8

1 4 10

3.5
49
Insert an element by moving
void insert( Comparable && x, BinaryNode * & t ) {
if( t == nullptr )
t = new BinaryNode{ std::move( x ), nullptr, nullptr };
// std::move is exactly equivalent to a static_cast to
// an rvalue reference type
else if( x < t->element )
insert( std::move( x ), t->left );
else if( t->element < x )
insert( std::move( x ), t->right );
else
; // Duplicate; do nothing
}
Complexity: O(d) 50
DELETION
Deleting a node, has to be done such that the
property of the Binary Search Tree is maintained.

If the node has no child, simply delete it

If the node has only one child, simply replace it


with its child

51
void remove( const Comparable & x, BinaryNode * & t ) {
if( t == nullptr )
return; // Item not found; do nothing
if( x < t->element )
remove( x, t->left );
else if( t->element < x )
remove( x, t->right );
else if( t->left != nullptr && t->right != nullptr ) // Two children
{
t->element = findMin( t->right )->element;
remove( t->element, t->right ); }
else {
BinaryNode *oldNode = t;
t = ( t->left != nullptr ) ? t->left : t->right;
delete oldNode; } } 52
If the node has two children:
Look at the right subtree of the node (subtree rooted at the
right child of the node).
Find the Minimum there.
Replace the key of the node to be deleted by the minimum
element.
Delete the minimum element.
Any problem deleting it?
Need to take care of the children of this min. element,
(The min element can have at most one child.)
For deletion convenience, always have a pointer from a
53
node to its parent.
5
Delete 3;
3 8
3 has 2 children;
1 4 10
Findmin in right subtree
3.5 of 3 returns 3.5
5
So 3 is replaced by 3.5,
and 3.5 is deleted.
3.5 8

1 4 10

54
Before Delete 4 After Delete 4
(with 1 child) (with 1 child)

55
Before Delete 2 After Delete 2
(with 2 children) (with 2 children)

Convince yourselves with more sophisticated cases!


56
Pseudo Code
Delete(node) {
If node is childless, then
{
node->parent->ptr_to_node = NULL
free node;
}
If node has one child
{
node->parent->child = node->child;
free node;
} 57
If a node has 2 children,
{
minnode = findmin(rightsubtree)->key;
node->key = minnode->key;
delete(minnode);
}
}

Complexity? O(d)

58
Operations on BSTs: Code

Binary Search Tree Code

59
AVL Trees
• We have seen that all operations depend on the
depth of the tree.
• We don’t want trees with large-height nodes
• This can be attained if both subtrees of each node
have roughly the same height.
• An AVL (Adelson-Velskii and Landis) tree is a BST
with a balance condition.
• The balance condition must be easy to maintain,
and it ensures that the depth of the tree is O(logN).
• Simplest idea: require left and right subtrees have
same height. 60
AVL Trees
Idea that left and
right subtrees have
roughly the same
height does not
force the tree to be
shallow

An AVL (Adelson-Velskii and Landis) tree is a


BST where the heights of the two subtrees of
any node differ by at most one
61
Height of a null tree is -1
Not AVL Tree

AVL Tree

62
Some AVL Tree Properties
• Height information is kept for each node (in the
node structure).
• It can be shown that the height of an AVL tree is
at most roughly 1.44 log(N + 2) − 1.328, but, in
practice, only slightly more than logN.
• The minimum number of nodes, S(h), in an AVL
tree of height h: S(h) = S(h−1)+S(h−2)+1.
For h = 0, S(h) = 1. For h = 1, S(h) = 2
 all the tree operations can be performed in
O(logN) time, except insertion and deletion (need to
update all the balancing information) 63
Operations in AVL Tree

Searching, Complexity? O(log N)


FindMin, Complexity? O(log N)

Deletion? Insertion?

64
Insertion into an AVL Tree

Insert 6

 Tree not AVL anymore

• The AVL property has to be restored before


the insertion step is considered over.
• This can be done with a simple modification
to the tree, known as a rotation. 65
Insertion into AVL Tree
• After an insertion, only nodes that are on the path
from the insertion point to the root might have their
balance altered because only those nodes have their
subtrees altered.
• As we follow the path up to the root and update the
balancing information, we may find a node whose
new balance violates the AVL condition.
• So we will rebalance the tree at the first (i.e.,
deepest) such node
• This rebalancing guarantees that the entire tree
satisfies the AVL property 66
Node rebalancing

• Let α be the node that must be rebalanced. Since


any node has at most two children,
• It is easy to see that a violation might occur in
four cases:
1. Insertion into the left subtree of left child of α
2. Insertion into the right subtree of left child of α
3. Insertion into the left subtree of right child of α
4. Insertion into the right subtree of right child of α
Note: 1 and 4 are mirror cases; same for 2 and673
Rotations
• The first case, in which the insertion occurs on
the “outside” (i.e., left–left or right–right), is
fixed by a single rotation of the tree.
• The second case, in which the insertion occurs
on the “inside” (i.e., left–right or right–left) is
handled by the slightly more complex double
rotation.
• These are fundamental operations on the tree
that we’ll see used several times in balanced-
tree algorithms 68
Single Rotation
After After
insertion rebalancing

• k2 violates the AVL balance property because its left subtree is


two levels deeper than its right subtree. (Generically: only
possible case: Subtree X has grown to an extra level, causing it
69
to be exactly two levels deeper than Z.)
Result of Single Rotation
• Note that Single Rotation requires only a few
pointer changes,
• We get another BST that is an AVL tree.
• This is because X moves up one level, Y stays at
the same level, and Z moves down one level.
• k2 and k1 not only satisfy the AVL requirements,
but they also have subtrees that are exactly the
same height.
• New height of entire subtree is exactly the same
as the height of the original subtree.
 no further updating of heights on the path to the
root is needed, and consequently no further rotations
70
needed.
Example of Single Rotation

After insertion After rebalancing

71
Double Rotation
• Single Rotation does not work for cases 2 and 3
(in which the insertion has occured on the “inside”, i.e.,
left–right or right–left of a node. After single
After rotation
insertion

• Subtree Y has had an item inserted into it guarantee that it is


nonempty.
• We may assume that it has a root and two subtrees.
•  the tree may be viewed as four subtrees connected by three72
nodes.
Case 2

After insertion After double rotation


Case 3

73
Result of Double Rotation

It is easy to see that the resulting tree


• satisfies the AVL tree property, and
• restores the height to what it was before the
insertion
 guarantee that all rebalancing and height
updating is complete

74
Pseudocode
Insert(X, T)
{
If (T = NULL)
insert X at T; T->height = 0;
If (X  T.element)
{
Insert(X, T ->left)
If Height(T ->left) - Height(T ->right) = 2
// SingleRotate routine in Fig 4.41 (Weiss)
// Separate for left and right nodes
// DoubleRotate routine in Fig 4.43 (Weiss)
// Separate for left and right nodes

75
{
If (X < T.leftchild.element) T =singleRotatewithleft(T);
else T =doubleRotatewithleft(T);
} }
Else If (X>T.element)
{ Insert(X, T ->right)
If Height(T ->right) - Height(T ->leftt) = 2
{
If (X > T.righchild.element) T =singleRotatewithright(T);
else T =doubleRotatewithright(T);
} }
T->height = max(height(T->left), height(T->right)) + 1;
Return(T); } 76
Extended Example
Insert 3,2,1,4,5,6,7, 16,15,14

3 2
3
3

2 1
Fig 1 2 3
Fig 4
Fig 2
2 1 2

Fig 3
1
1 3
3
Fig 5 Fig 6 4
77
4
5
2
2

1
1 4
4
3 5
3 5
Fig 8
Fig 7 6
4 4

2 2
5 5
1 3 6 1 3 6
4

Fig 9 Fig 10 7
2
6
1 3 7
5

Fig 11 78
4 4

2 2
6 6
1 3 5 7 1 3 5 7

16 16
Fig 12
Fig 13 15
4

2
6
1 3 15
5

Fig 14 7 16

79
4 4

2 2
6 7

1 3 15 15
5 1 3
6
16
7 14
Fig 15 5 16

14
Fig 16

Continued in Book

Deletions can be done with similar rotations

80
Tree Traversal Revisited
• Inorder traversal: process left subtree, process current
node, process right subtree. E.g. to list the elements of
a BST inorderTraversalBST
– total running time: O(N): constant work performed at every
node in the tree (testing against nullptr, setting up two
function calls, and doing an output statement) & each node is
visited once.
• Postorder traversal: when we need to process both
subtrees first before we can process a node. E.g. to
compute the height of a node LTree, RTree, Node
– total running time: O(N): constant work performed at each
node postOrderTraversalBST
81
Tree Traversal Revisited
• PreOrder traversal: Node is processed before the
children. E.g. to label each node with its depth.
(See file system example in this chapter)

• Level-order traversal. All nodes at depth d are


processed before any node at depth d + 1.
– Level-order traversal differs from the other traversals in
that it is not done recursively; a queue is used, instead
of the implied stack of recursion.
– To be seen later in the course (Breadth-First Search)
82
B-Trees
• So far, we have assumed that we can store an entire
data structure in the main memory of a computer.
• Often, the volume of data is too large
We must have the data structure reside on disk.
The rules of the game change, because the Big-Oh
model is no longer meaningful.
• Problem: a Big-Oh analysis assumes that all operations
are equal, which is not true, especially when disk I/O is
involved.
• Billions of instructions executed per second on modern
computers; only ~120 disk I/Os per second
• Processor speeds increasing much faster than disk I/O83
B-trees Motivation

• Need to reduce disk accesses as much as


possible to reduce running time.
• We are willing to write complicated code to
do this, because machine instructions are
essentially free.
• A BST will not work, since the typical AVL
tree is close to optimal height. logN is the
best we can reach with BST
• What is the solution?
84
Solution: M-ary search trees
• If we have more branching, we have less height.
• While a perfect binary tree of 31 nodes has five
levels, a 5-ary tree of 31 nodes has only three
levels

85
• An M-ary search tree allows M-way branching  As
branching increases, the depth decreases.
• Whereas a complete binary tree has height roughly log2 N, a
complete M-ary tree has height roughly logM N.
• M-ary search tree can be created in the same way as a BST.
• In a BST, we need one key to decide which of two branches
to take. In an M-ary search tree, we need M − 1 keys to
decide.
• To make this scheme efficient in the worst case, we need to
ensure that the M-ary search tree is balanced in some way.
Otherwise, like a BST, it could degenerate into a linked list.
• In fact, we want an even more restrictive balancing condition
so that an M-ary search tree does not degenerate to even a
BST.
86
B-Trees (B+ Trees)
A B-tree of order M is an M-ary tree such that:
1. The data items are stored at leaves.
2. The nonleaf nodes store up to M − 1 keys to guide the
searching; key i represents the smallest key in subtree i +1
3. The root is either a leaf or has between two and M
children.
4. All nonleaf nodes (except the root) have between
and M children. (avoids degeneration into binary tree)
5. All leaves are at the same depth and have between
and L data items, for some L (the determination of L is
described shortly).
87
N.B.: Rules 3 and 5 must be relaxed for the first L insertions.
Example

• All nonleaf nodes have between three and five children


(and thus between two and four keys);
• The root could possibly have only two children
• Here, L = 5. It happens that L = M
• L = 5  each leaf has between three and five data items
88
Choice of M and L: Example
• Each node represents a disk block, so we choose M and L on the
basis of the size of the items that are being stored.
• Suppose that we have 10,000,000 data items, each key is 32 bytes
(e.g. a name), a record is 256 bytes, and 1 block holds 8,192 bytes.
• In a B-tree of order M, we would have M−1 keys, for a total of
32M − 32 bytes, plus M branches.
• Since each branch is essentially a ptr to another disk block, we can
assume that a branch is 4 bytes  the branches use 4M bytes.
• The total memory requirement for a nonleaf node is thus 36M−32.
• The largest value of M for which this is no more than 8,192 is 228
 we choose M = 228.
• Since each data record is 256 bytes, we would be able to fit 32
records in a block. Thus we would choose L = 32.
89
• M = 228. L = 32.
each leaf has between 16 and 32 data records
each internal node (except the root) branches in at
least 114 ways.
• Since there are 10,000,000 records, there are, at most,
625,000 leaves.
In the worst case, leaves would be on level 4.
In more concrete terms, the worst-case number of
accesses is given by approximately logM/2 N, give or
take 1. (For example, the root and the next level could
be cached in main memory, so that over the long run,
disk accesses would be needed only for level 3 and
deeper.)
90
Insertion into a B-Tree
Insert 57 into previous B-Tree

A search down the tree reveals that 57 not in the tree


Leaf node not full  no problem. Can be added as
fifth item then data is reorganised (negligible cost)
(Just an extra cost of writing on the disk) 91
• Insert 55 into previous (resulting) B-Tree
• Upon searching down the tree, leaf node found full
• Since we now have L+1 items, we split them into
two leaves each containing the minimum required
• We form two leaves with three items each.
• Two disk accesses are required to write these
leaves, and a third disk access is required to update
the parent.
• Note that in the parent, both keys and branches
change, but they do so in a controlled way that is
easily calculated.
92
• Although splitting nodes is time-consuming (at least two
additional disk writes), it is a relatively rare occurrence.
• E.g. if L = 32, then when a node is split, two leaves with 16
and 17 items, respectively, are created.
For the leaf with 17 items, we can perform 15 more
insertions without another split.
More generally, for every split, there are roughly L/2
nonsplits. 93
• Previous case: the node splitting worked because the parent did
not have its full complement of children. What would happen if
it did?
• Suppose, that we insert 40 into the previous (resulting) B-tree
• We must split the leaf containing the keys 35 through 39, and
now 40, into two leaves.
 Parent has six children (only 5 allowed)
Split the parent.
When parent is split, we must update the values of the keys and
also the parent’s parent, thus incurring an additional two disk
writes (so this insertion costs five disk writes).
• However, once again, the keys change in a very controlled
manner, although the code is certainly not simple because of a
lot of cases
94
95
• When a nonleaf node is split, its parent gains a child.
• If the parent already has reached its limit of children, then we
continue splitting nodes up the tree until :
– either we find a parent that does not need to be split or
– we reach the root.
• If we split the root, then we have two roots (unacceptable!)
• Create a new root that has the split roots as its two children.
(This is why the root is granted the special two-child minimum
exemption. It also is the only way that a B-tree gains height.
• Notes: splitting all the way up to the root is an exceptionally
rare event.
• This is because a tree with four levels indicates that the root has
been split three times throughout the entire sequence of
insertions (assuming no deletions have occurred).
• The splitting of any nonleaf node is also quite rare.
96
Deletion from a B-Tree
• Deletion is performed by finding the item that needs to be removed
and then removing it.
• Problem if the leaf it was in had the minimum number of data items
 it is now below the minimum.
• We can rectify this situation by adopting a neighboring item, if the
neighbor is not itself at its minimum.
• If the neighbor is already at its minimum, then we can combine with
the neighbor to form a full leaf.
This means that the parent has lost a child. If this causes the parent
to fall below its minimum, then it follows the same strategy.
• This process could percolate all the way up to the root.
• If a root is left with one child as a result of the adoption process,
then we remove the root and make its child the new root of the tree.
• This is the only way for a B-tree to lose height. 97
Suppose we want to delete 99 from the previous
resulting) B-Tree

98
Slides based on the textbook
Mark Allen Weiss,
(2014 ) Data
Structures and
Algorithm Analysis
in C++, 4th edition,
Pearson.

Acknowledgement: This course PowerPoints make substantial (non-exclusive) use of


the PPT chapters prepared by Prof. Saswati Sarkar from the University of Pennsylvania,
USA, themselves developed on the basis of the course textbook. Other references, if99any,
will be mentioned wherever applicable.

You might also like