Multiway Search Tree
Multiway Search Tree
A multiway tree is a tree that can have more than two children. A multiway tree of
order m (or an m-way tree) is one in which a tree can have m children.
As with the other trees that have been studied, the nodes in an m-way tree will be
made up of key fields, in this case m-1 key fields, and pointers to children.
To make the processing of m-way trees easier some type of order will be imposed on
the keys within each node, resulting in a multiway search tree of order m ( or an m-
way search tree). By definition an m-way search tree is a m-way tree in which:
The keys in the first i children are smaller than the ith key
The keys in the last m-i children are larger than the ith key
B-Trees
1. The root has at least two subtrees unless it is the only node in the tree.
2. Each nonroot and each nonleaf node have at most m nonempty children and at
least m/2 nonempty children.
3. The number of keys in each nonroot and each nonleaf node is one less than the
number of its nonempty children.
The nodes in a B-tree are usually implemented as a class that contains an array of m-l
cells for keys, an array of m pointers to other nodes, and whatever other information is
required in order to facilitate tree maintenance.
private:
T keys[M-1];
BTreeNode *pointers[M];
...
};
Searching a B-tree
An algorithm for finding a key in B-tree is simple. Start at the root and determine
which pointer to follow based on a comparison between the search value and key
fields in the root node. Follow the appropriate pointer to a child node. Examine the
key fields in the child node and continue to follow the appropriate pointers until the
search value is found or a leaf node is reached that doesn't contain the desired search
value.
The condition that all leaves must be on the same level forces a characteristic behavior
of B-trees, namely that B-trees are not allowed to grow at the their leaves; instead they
are forced to grow at the root.
When inserting into a B-tree, a value is inserted directly into a leaf. This leads to three
common situations that can occur:
In this case, the leaf node where the value should be inserted is split in two, resulting
in a new leaf node. Half of the keys will be moved from the full leaf to the new leaf.
The new leaf is then incorporated into the B-tree.
The new leaf is incorporated by moving the middle value to the parent and a pointer to
the new leaf is also added to the parent. This process is continues up the tree until all
of the values have "found" a location.
The upward movement of values from case 2 means that it's possible that a value
could move up to the root of the B-tree. If the root is full, the same basic process from
case 2 will be applied and a new root will be created. This type of split results in 2
new nodes being added to the B-tree.
Results in:
The 15 needs to be moved to the root node but it is full. This means that the root needs
to be divided:
The 15 is inserted into the parent, which means that it becomes the new root node:
As usual, this is the hardest of the processes to apply. The deletion process will
basically be a reversal of the insertion process - rather than splitting nodes, it's
possible that nodes will be merged so that B-tree properties, namely the requirement
that a node must be at least half full, can be maintained.
1a) If the leaf is at least half full after deleting the desired value, the remaining larger
values are moved to "fill the gap".
results in:
1b) If the leaf is less than half full after deleting the desired value (known as
underflow), two things could happen:
Special Case for 1b-2: When merging nodes, if the parent is the root with only one
key, the keys from the node, the sibling, and the only key of the root are placed into a
node and this will become the new root for the B-tree. Both the sibling and the old
root will be discarded.
Case 2: Deletion from a non-leaf
This case can lead to problems with tree reorganization but it will be solved in a
manner similar to deletion from a binary search tree.
The key to be deleted will be replaced by its immediate predecessor (or successor) and
then the predecessor (or successor) will be deleted since it can only be found in a leaf
node.
The vales in the left sibling are combined with the separator key (18) and the
remaining values. They are divided between the 2 nodes:
Most queries can be executed more quickly if the values are stored in order. But it's
not practical to hope to store all the rows in the table one after another, in sorted order,
because this requires rewriting the entire table with each insertion or deletion of a row.
This leads us to instead imagine storing our rows in a tree structure. Our first instinct
would be a balanced binary search tree like a red-black tree, but this really doesn't
make much sense for a database since it is stored on disk. You see, disks work by
reading and writing whole blocks of data at once — typically 512 bytes or four
kilobytes. A node of a binary search tree uses a small fraction of that, so it makes
sense to look for a structure that fits more neatly into a disk block.
Hence the B+-tree, in which each node stores up to d references to children and up
to d − 1 keys. Each reference is considered “between” two of the node's keys; it
references the root of a subtree for which all values are between these two keys.
A B+-tree requires that each leaf be the same distance from the root, as in this picture,
where searching for any of the 11 values (all listed on the bottom level) will involve
loading three nodes from the disk (the root block, a second-level block, and a leaf).
In practice, d will be larger — as large, in fact, as it takes to fill a disk block. Suppose
a block is 4KB, our keys are 4-byte integers, and each reference is a 6-byte file offset.
Then we'd choose d to be the largest value so that 4 (d − 1) + 6 d ≤ 4096; solving this
inequality for d, we end up with d ≤ 410, so we'd use 410 for d. As you can see, d can
be large.
Every key from the table appears in a leaf, in left-to-right sorted order.
In our examples, we'll continue to use 4 for d. Looking at our invariants, this requires
that each leaf have at least two keys, and each internal node to have at least two
children (and thus at least one key).
2. Insertion algorithm
1. If the node has an empty space, insert the key/reference pair into the node.
2. If the node is already full, split it into two nodes, distributing the keys evenly
between the two nodes. If the node is a leaf, take a copy of the minimum value
in the second of these two nodes and repeat this insertion algorithm to insert it
into the parent node. If the node is a non-leaf, exclude the middle value during
the split and repeat this insertion algorithm to insert this excluded value into the
parent node.
Initial:
Insert 20:
Insert 13:
Insert 15:
Insert 10:
Insert 11:
Insert 12:
3. Deletion algorithm
2. If the node still has enough keys and references to satisfy the invariants, stop.
3. If the node has too few keys to satisfy the invariants, but its next oldest or next
youngest sibling at the same level has more than necessary, distribute the keys
between this node and the neighbor. Repair the keys in the level above to
represent that these nodes now have a different “split point” between them; this
involves simply changing a key in the levels above, without deletion or
insertion.
4. If the node has too few keys to satisfy the invariant, and the next oldest or next
youngest sibling is at the minimum for the invariant, then merge the node with
its sibling; if the node is a non-leaf, we will need to incorporate the “split key”
from the parent into our merging. In either case, we will need to repeat the
removal algorithm on the parent node to remove the “split key” that previously
separated these merged nodes — unless the parent is the root and we are
removing the final key from the root, in which case the merged node becomes
the new root (and the tree has become one level shorter than before).
Initial:
Delete 13:
Delete 15:
Delete 1: