Search Searching is similar to searching a binary search tree. Starting at the root, the tree is recursively traversed from top to bottom. At each level, the search reduces its field of view to the child pointer (subtree) whose range includes the search value. A subtree's range is defined by the values, or keys, contained in its parent node. These limiting values are also known as separation values.
Binary search is typically (but not necessarily) used within nodes to find the separation values and child tree of interest.
Insertion All insertions start at a leaf node. To insert a new element, search the tree to find the leaf node where the new element should be added. Insert the new element into that node with the following steps: • If the node contains fewer than the maximum allowed number of elements, then there is room for the new element. Insert the new element in the node, keeping the node's elements ordered. • Otherwise, the node is full, evenly split it into two nodes, so: • A single median is chosen from among the leaf's elements and the new element that is being inserted. • Values less than the median are put in the new left node, and values greater than the median are put in the new right node, with the median acting as a separation value. • The separation value is inserted in the node's parent, which may cause it to be split, and so on. If the node has no parent (i.e., the node was the root), create a new root above this node (increasing the height of the tree). If the splitting goes all the way up to the root, it creates a new root with a single separator value and two children, which is why the lower bound on the size of internal nodes does not apply to the root. The maximum number of elements per node is
U−1. When a node is split, one element moves to the parent, but one element is added. So, it must be possible to divide the maximum number
U−1 of elements into two legal nodes. If this number is odd, then
U=2
L and one of the new nodes contains (
U−2)/2 =
L−1 elements, and hence is a legal node, and the other contains one more element, and hence it is legal too. If
U−1 is even, then
U=2
L−1, so there are 2
L−2 elements in the node. Half of this number is
L−1, which is the minimum number of elements allowed per node. An alternative algorithm supports a single pass down the tree from the root to the node where the insertion will take place, splitting any full nodes encountered on the way pre-emptively. This prevents the need to recall the parent nodes into memory, which may be expensive if the nodes are on secondary storage. However, to use this algorithm, we must be able to send one element to the parent and split the remaining
U−2 elements into two legal nodes, without adding a new element. This requires
U = 2
L rather than
U = 2
L−1, which accounts for why some textbooks impose this requirement in defining B-trees.
Deletion There are two popular strategies for deletion from a B-tree. • Locate and delete the item, then restructure the tree to retain its invariants,
OR • Do a single pass down the tree, but before entering (visiting) a node, restructure the tree so that once the key to be deleted is encountered, it can be deleted without triggering the need for any further restructuring. The algorithm below uses the former strategy. There are two special cases to consider when deleting an element: • The element in an internal node is a separator for its child nodes. • Deleting an element may put its node under the minimum number of elements and children. The procedures for these cases are listed below.
Deletion from a leaf node • Search for the value to delete. • If the value is in a leaf node, simply delete it from the node. • If underflow happens, rebalance the tree as described in the section "Rebalancing after deletion" below.
Deletion from an internal node Each element in an internal node acts as a separation value for two subtrees; we need to find a replacement for separation. Note that the largest element in the left subtree is still less than the separator. Likewise, the smallest element in the right subtree is still greater than the separator. Both of those elements are in leaf nodes, and either one can be the new separator for the two subtrees. Algorithmically described below: • Choose a new separator (either the largest element in the left subtree or the smallest element in the right subtree), remove it from the leaf node it is in, and replace the element to be deleted with the new separator. • The previous step deleted an element (the new separator) from a leaf node. If that leaf node is now deficient (has fewer than the required number of nodes), then rebalance the tree starting from the leaf node.
Rebalancing after deletion Rebalancing starts from a leaf and proceeds toward the root until the tree is balanced. If deleting an element from a node has brought it under the minimum size, then some elements must be redistributed to bring all nodes up to the minimum. Usually, the redistribution involves moving an element from a sibling node that has more than the minimum number of nodes. That redistribution operation is called a
rotation. If no sibling can spare an element, then the deficient node must be
merged with a sibling. The merge causes the parent to lose a separator element, so the parent may become deficient and need rebalancing. The merging and rebalancing may continue all the way to the root. Since the minimum element count doesn't apply to the root, making the root be the only deficient node is not a problem. The algorithm to rebalance the tree is as follows: • If the deficient node's right sibling exists and has more than the minimum number of elements, then rotate left. • Copy the separator from the parent to the end of the deficient node (the separator moves down; the deficient node now has the minimum number of elements) • Replace the separator in the parent with the first element of the right sibling (right sibling loses one node but still has at least the minimum number of elements) • The tree is now balanced. • Otherwise, if the deficient node's left sibling exists and has more than the minimum number of elements, then rotate right. • Copy the separator from the parent to the start of the deficient node (the separator moves down; deficient node now has the minimum number of elements) • Replace the separator in the parent with the last element of the left sibling (left sibling loses one node but still has at least the minimum number of elements) • The tree is now balanced. • Otherwise, if both immediate siblings have only the minimum number of elements, then merge with a sibling sandwiching their separator taken off from their parent. • Copy the separator to the end of the left node (the left node may be the deficient node, or it may be the sibling with the minimum number of elements) • Move all elements from the right node to the left node (the left node now has the maximum number of elements, and the right node is empty) • Remove the separator from the parent along with its empty right child (the parent loses an element) • If the parent is the root and now has no elements, then free it and make the merged node the new root (tree becomes shallower) • Otherwise, if the parent has fewer than the required number of elements, then rebalance the parent :
Note: The rebalancing operations are different for B+ trees (e.g., rotation is different because parent has copy of the key) and B*-tree (e.g., three siblings are merged into two siblings).
Sequential access While freshly loaded databases tend to have good sequential behaviour, this behaviour becomes increasingly difficult to maintain as a database grows, resulting in more random I/O and performance challenges.
Initial construction A common special case is adding a large amount of
pre-sorted data into an initially empty B-tree. While it is quite possible to simply perform a series of successive inserts, inserting sorted data results in a tree composed almost entirely of half-full nodes. Instead, a special "bulk loading"
algorithm can be used to produce a more efficient tree with a higher branching factor. When the input is sorted, all insertions are at the rightmost edge of the tree, and in particular any time a node is split, we are guaranteed that no more insertions will take place in the left half. When bulk loading, we take advantage of this, and instead of splitting overfull nodes evenly, split them as
unevenly as possible: leave the left node completely full and create a right node with zero keys and one child (in violation of the usual B-tree rules). At the end of bulk loading, the tree is composed almost entirely of completely full nodes; only the rightmost node on each level may be less than full. Because those nodes may also be less than
half full, to re-establish the normal B-tree rules, combine such nodes with their (guaranteed full) left siblings and divide the keys to produce two nodes at least half full. The only node which lacks a full left sibling is the root, which is permitted to be less than half full. ==In filesystems==