Red–black tree

In computer science, a red–black tree is a self-balancing binary search tree data structure noted for fast storage and retrieval of ordered information. The nodes in a red-black tree hold an extra "color" bit, often drawn as red and black, which help ensure that the tree is always approximately balanced.

History

In 1972, Rudolf Bayer invented a data structure that was a special order-4 case of a B-tree. These trees maintained all paths from root to leaf with the same number of nodes, creating perfectly balanced trees. However, they were not binary search trees. Bayer called them a "symmetric binary B-tree" in his paper and later they became popular as 2–3–4 trees or even 2–3 trees. In a 1978 paper, "A Dichromatic Framework for Balanced Trees", Leonidas J. Guibas and Robert Sedgewick derived the red–black tree from the symmetric binary B-tree. The color "red" was chosen because it was the best-looking color produced by the color laser printer available to the authors while working at Xerox PARC. Another response from Guibas states that it was because of the red and black pens available to them to draw the trees. In 1993, Arne Andersson introduced the idea of a right leaning tree to simplify insert and delete operations. In 1999, Chris Okasaki showed how to make the insert operation purely functional. Its balance function needed to take care of only four unbalanced cases and one default balanced case. The original algorithm used eight unbalanced cases, but reduced that to six unbalanced cases.{{cite web In 2008, Sedgewick proposed the left-leaning red–black tree, leveraging Andersson’s idea that simplified the insert and delete operations. Sedgewick originally allowed nodes whose two children are red, making his trees more like 2–3–4 trees, but later this restriction was added, making new trees more like 2–3 trees. Sedgewick implemented the insert algorithm in just 33 lines, significantly shortening his original 46 lines of code.{{cite web == Terminology ==

Terminology

The black depth of a node is defined as the number of black nodes from the root to that node (i.e. the number of black ancestors). The black height of a red–black tree is the number of black nodes in any path from the root to the leaves, which, by requirement 4, is constant (alternatively, it could be defined as the black depth of any leaf node). The black height of a node is the black height of the subtree rooted by it. In this article, the black height of a null node shall be set to 0, because its subtree is empty as suggested by the example figure, and its tree height is also 0. == Properties ==

Properties

In addition to the requirements imposed on a binary search tree the following must be satisfied by a • Every node is either red or black. • All null nodes are considered black. • A red node does not have a red child. • Every path from a given node to any of its leaf nodes (that is, to any descendant null node) goes through the same number of black nodes. • (Conclusion) If a node N has exactly one child, the child must be red. If the child were black, its leaves would sit at a different black depth than N's null node (which is considered black by rule 2), violating requirement 4. Some authors, e.g. Cormen & al., claim "the root is black" as fifth requirement; but not Mehlhorn & Sanders or Sedgewick & Wayne. Since the root can always be changed from red to black, this rule has little effect on analysis. This article also omits it, because it slightly disturbs the recursive algorithms and proofs. As an example, every perfect binary tree that consists only of black nodes is a red–black tree. The read-only operations, such as search or tree traversal, do not affect any of the requirements. In contrast, the modifying operations insert and delete easily maintain requirements 1 and 2, but with respect to the other requirements some extra effort must be made, to avoid introducing a violation of requirement 3, called a red-violation, or of requirement 4, called a black-violation. The requirements enforce a critical property of red–black trees: the path from the root to the farthest leaf is no more than twice as long as the path from the root to the nearest leaf. The result is that the tree is height-balanced. Since operations such as inserting, deleting, and finding values require worst-case time proportional to the height h of the tree, this upper bound on the height allows red–black trees to be efficient in the worst case, namely logarithmic in the number n of entries, i.e. (a property which is shared by all self-balancing trees, e.g., AVL tree or B-tree, but not the ordinary binary search trees). For a mathematical proof see section Proof of bounds. Red–black trees, like all binary search trees, allow quite efficient sequential access (e.g. in-order traversal, that is: in the order Left–Root–Right) of their elements. But they support also asymptotically optimal direct access via a traversal from root to leaf, resulting in O(\log n) search time. == Analogy to 2–3–4 trees ==

Analogy to 2–3–4 trees

Red–black trees are similar in structure to 2–3–4 trees, which are B-trees of order 4. In 2–3–4 trees, each node can contain between 1 and 3 values and have between 2 and 4 children. These 2–3–4 nodes correspond to black node – red children groups in red-black trees, as shown in figure 1. It is not a 1-to-1 correspondence, because 3-nodes have two equivalent representations: the red child may lie either to the left or right. The left-leaning red-black tree variant makes this relationship exactly 1-to-1, by only allowing the left child representation. Since every 2–3–4 node has a corresponding black node, invariant 4 of red-black trees is equivalent to saying that the leaves of a 2–3–4 tree all lie at the same level. Despite structural similarities, operations on red–black trees are more economical than B-trees. B-trees require management of vectors of variable length, whereas red-black trees are simply binary trees. == Applications and related data structures ==

Applications and related data structures

Red–black trees offer worst-case guarantees for insertion time, deletion time, and search time. Not only does this make them valuable in time-sensitive applications such as real-time applications, but it makes them valuable building blocks in other data structures that provide worst-case guarantees. For example, many data structures used in computational geometry are based on red–black trees, and the Completely Fair Scheduler and epoll system call of the Linux kernel use red–black trees. The AVL tree is another structure supporting O(\log n) search, insertion, and removal. AVL trees can be colored red–black, and thus are a subset of red-black trees. The worst-case height of AVL is 0.720 times the worst-case height of red-black trees, so AVL trees are more rigidly balanced. The performance measurements of Ben Pfaff with realistic test cases in 79 runs find AVL to RB ratios between 0.677 and 1.077, median at 0.947, and geometric mean 0.910. The performance of WAVL trees lies between AVL trees and red-black trees. Red–black trees are also particularly valuable in functional programming, where they are one of the most common persistent data structures, used to construct associative arrays and sets that can retain previous versions after mutations. The persistent version of red–black trees requires O(\log n) space for each insertion or deletion, in addition to time. For every 2–3–4 tree, there are corresponding red–black trees with data elements in the same order. The insertion and deletion operations on 2–3–4 trees are also equivalent to color-flipping and rotations in red–black trees. This makes 2–3–4 trees an important tool for understanding the logic behind red–black trees, and this is why many introductory algorithm texts introduce 2–3–4 trees just before red–black trees, even though 2–3–4 trees are not often used in practice. In 2008, Sedgewick introduced a simpler version of the red–black tree called the left-leaning red–black tree by eliminating a previously unspecified degree of freedom in the implementation. The LLRB maintains an additional invariant that all red links must lean left except during inserts and deletes. Red–black trees can be made isometric to either 2–3 trees, or 2–3–4 trees, == Implementation ==

Implementation

The read-only operations, such as search or tree traversal, on a red–black tree require no modification from those used for binary search trees, because every red–black tree is a special case of a simple binary search tree. However, the immediate result of an insertion or removal may violate the properties of a red–black tree, the restoration of which is called rebalancing so that red–black trees become self-balancing. Rebalancing (i.e. color changes) has a worst-case time complexity of O(\log n) and average of O(1), (two for insertion). This is an example implementation of insert and remove in C. Below are the data structures and the rotate_subtree helper function used in the insert and remove examples. typedef enum Color: char { BLACK, RED } Color; typedef enum Direction: char { LEFT, RIGHT } Direction; // red-black tree node typedef struct Node { struct Node* parent; // null for the root node union { // Union so we can use ->left/->right or ->child[0]/->child[1] struct { struct Node* left; struct Node* right; }; struct Node* child[2]; }; Color color; int key; } Node; typedef struct { struct Node* root; } Tree; static Direction direction(const Node* N) { return N == N->parent->right ? RIGHT : LEFT; } Node* rotate_subtree(Tree* tree, Node* sub, Direction dir) { Node* sub_parent = sub->parent; Node* new_root = sub->child[1 - dir]; // 1 - dir is the opposite direction Node* new_child = new_root->child[dir]; sub->child[1 - dir] = new_child; if (new_child) { new_child->parent = sub; } new_root->child[dir] = sub; new_root->parent = sub_parent; sub->parent = new_root; if (sub_parent) { sub_parent->child[sub == sub_parent->right] = new_root; } else { tree->root = new_root; } return new_root; } Notes to the sample code and diagrams of insertion and removal The proposal breaks down both insertion and removal (not mentioning some very simple cases) into six constellations of nodes, edges, and colors, which are called cases. The proposal contains, for both insertion and removal, exactly one case that advances one black level closer to the root and loops, the other five cases rebalance the tree of their own. The more complicated cases are pictured in a diagram. • symbolises a red node and a (non-NULL) black node (of black height ≥ 1), symbolises the color red or black of a non-NULL node, but the same color throughout the same diagram. NULL nodes are not represented in the diagrams. • The variable N denotes the current node, which is labeled N or N in the diagrams. • A diagram contains three columns and two to four actions. The left column shows the first iteration, the right column the higher iterations, the middle column shows the segmentation of a case into its different actions. • The action "entry" shows the constellation of nodes with their colors which defines a case and mostly violates some of the requirements.A blue border rings the current node N and the other nodes are labeled according to their relation to N. • If a rotation is considered useful, this is pictured in the next action, which is labeled "rotation". • If some recoloring is considered useful, this is pictured in the next action, which is labeled "color". • If there is still some need to repair, the cases make use of code of other cases and this after a reassignment of the current node N, which then again carries a blue ring and relative to which other nodes may have to be reassigned also. This action is labeled "reassign".For both, insert and delete, there is (exactly) one case which iterates one black level closer to the root; then the reassigned constellation satisfies the respective loop invariant. • A possibly numbered triangle with a black circle atop represents a red–black subtree (connected to its parent according to requirement 3) with a black height equal to the iteration level minus one, i.e. zero in the first iteration. Its root may be red or black.A possibly numbered triangle represents a red–black subtree with a black height one less, i.e. its parent has black height zero in the second iteration. ; Remark : For simplicity, the sample code uses the disjunction: :: U == NULL || U->color == BLACK // considered black : and the conjunction: :: U != NULL && U->color == RED // not considered black : Thereby, it must be kept in mind that both statements are not evaluated in total, if U == NULL. Then in both cases U->color is not touched (see Short-circuit evaluation).(The comment considered black is in accordance with requirement 2.) : The related if-statements have to occur far less frequently if the proposal) • Out of the body of the loop there are exiting branches to the cases 3, 6, 5, 4, and 1; section "Delete case 3" of its own has three different exiting branches to the cases 6, 5 and 4. • Rotations occur in cases 6 and 5 + 6 and 3 + 5 + 6 – all outside the loop. Therefore, at most three rotations occur in total. Delete case 1 The current node N is the new root. One black node has been removed from every path, so the RB-properties are preserved. The black height of the tree decreases by 1. Delete case 2 P, S, and S’s children are black. After painting S red all paths passing through S, which are precisely those paths not passing through N, have one less black node. Now all paths in the subtree rooted by P have the same number of black nodes, but one fewer than the paths that do not pass through P, so requirement 4 may still be violated. After relabeling P to N the loop invariant is fulfilled so that the rebalancing can be iterated on one black level (= 1 tree level) higher. Delete case 3 The sibling S is red, so P and the nephews C and D have to be black. A at P turns S into N’s grandparent. Then after reversing the colors of P and S, the path through N is still short one black node. But N now has a red parent P and after the reassignment a black sibling S, so the transformations in cases 4, 5, or 6 are able to restore the RB-shape. Delete case 4 The sibling S and S’s children are black, but P is red. Exchanging the colors of S and P does not affect the number of black nodes on paths going through S, but it does add one to the number of black nodes on paths going through N, making up for the deleted black node on those paths. Delete case 5 The sibling S is black, S’s close child C is red, and S’s distant child D is black. After a at S the nephew C becomes S’s parent and N’s new sibling. The colors of S and C are exchanged. All paths still have the same number of black nodes, but now N has a black sibling whose distant child is red, so the constellation is fit for case D6. Neither N nor its parent P are affected by this transformation, and P may be red or black ( in the diagram). Delete case 6 The sibling S is black, S’s distant child D is red. After a at P the sibling S becomes the parent of P and S’s distant child D. The colors of P and S are exchanged, and D is made black. The whole subtree still has the same color at its root S, namely either red or black ( in the diagram), which refers to the same color both before and after the transformation. This way requirement 3 is preserved. The paths in the subtree not passing through N (i.o.w. passing through D and node 3 in the diagram) pass through the same number of black nodes as before, but N now has one additional black ancestor: either P has become black, or it was black and S was added as a black grandparent. Thus, the paths passing through N pass through one additional black node, so that requirement 4 is restored and the total tree is in RB-shape. Because the algorithm transforms the input without using an auxiliary data structure and using only a small amount of extra storage space for auxiliary variables it is in-place. == Proof of bounds ==

Proof of bounds

For h\in\N there is a red–black tree of height h with : nodes (\lfloor \, \rfloor is the floor function) and there is no red–black tree of this tree height with fewer nodes—therefore it is minimal.Its black height is \lceil h/2\rceil (with black root) or for odd h (then with a red root) also (h-1)/2~. ;Proof For a red–black tree of a certain height to have minimal number of nodes, it must have exactly one longest path with maximal number of red nodes, to achieve a maximal tree height with a minimal black height. Besides this path all other nodes have to be black. ;Conclusion A red–black tree with n nodes (keys) has tree height h \in O(\log n) . == Set operations and bulk operations ==

Set operations and bulk operations

In addition to the single-element insert, delete and lookup operations, several set operations have been defined on union, intersection and set difference. Then fast bulk operations on insertions or deletions can be implemented based on these set functions. These set operations rely on two helper operations, Split and Join. With the new operations, the implementation of red–black trees can be more efficient and highly-parallelizable. In order to achieve its time complexities this implementation requires that the root is allowed to be either red or black, and that every node stores its own black height. • Join: The function Join is on two red–black trees and and a key , where , i.e. all keys in are less than , and all keys in are greater than . It returns a tree containing all elements in , also as . : If the two trees have the same black height, Join simply creates a new node with left subtree , root and right subtree . If both and have black root, set to be red. Otherwise is set black. : If the black heights are unequal, suppose that has larger black height than (the other case is symmetric). Join follows the right spine of until a black node , which is balanced with . At this point a new node with left child , root (set to be red) and right child is created to replace c. The new node may invalidate the red–black invariant because at most three red nodes can appear in a row. This can be fixed with a double rotation. If double red issue propagates to the root, the root is then set to be black, restoring the properties. The cost of this function is the difference of the black heights between the two input trees. • Split: To split a red–black tree into two smaller trees, those smaller than key , and those larger than key , first draw a path from the root by inserting into the red–black tree. After this insertion, all values less than will be found on the left of the path, and all values greater than will be found on the right. By applying Join, all the subtrees on the left side are merged bottom-up using keys on the path as intermediate nodes from bottom to top to form the left tree, and the right part is symmetric. : For some applications, Split also returns a Boolean value denoting if appears in the tree. The cost of Split is O(\log n) , order of the height of the tree. This algorithm actually has nothing to do with any special properties of a red–black tree, and may be used on any tree with a join operation, such as an AVL tree. The join algorithm is as follows: function joinRightRB(TL, k, TR): if (TL.color=black) and (TL.blackHeight=TR.blackHeight): return Node(TL,⟨k,red⟩,TR) T'=Node(TL.left,⟨TL.key,TL.color⟩,joinRightRB(TL.right,k,TR)) if (TL.color=black) and (T'.right.color=T'.right.right.color=red): T'.right.right.color=black; return rotateLeft(T') return T' /* T[recte T'] */ function joinLeftRB(TL, k, TR): /* symmetric to joinRightRB */ function join(TL, k, TR): if TL.blackHeight>TR.blackHeight: T'=joinRightRB(TL,k,TR) if (T'.color=red) and (T'.right.color=red): T'.color=black return T' if TR.blackHeight>TL.blackHeight: /* symmetric */ if (TL.color=black) and (TR.color=black): return Node(TL,⟨k,red⟩,TR) return Node(TL,⟨k,black⟩,TR) The split algorithm is as follows: function split(T, k): if (T = NULL) return (NULL, false, NULL) if (k = T.key) return (T.left, true, T.right) if (k 1, t2): if t1 = NULL return t2 if t2 = NULL return t1 (L1,b,R1)=split(t1,t2.key) proc1=start: TL=union(L1,t2.left) proc2=start: TR=union(R1,t2.right) wait all proc1,proc2 return join(TL, t2.key, TR) Here, split is presumed to return two trees: one holding the keys less its input key, one holding the greater keys. (The algorithm is non-destructive, but an in-place destructive version exists also.) The algorithm for intersection or difference is similar, but requires the Join2 helper routine that is the same as Join but without the middle key. Based on the new functions for union, intersection or difference, either one key or multiple keys can be inserted to or deleted from the red–black tree. Since Split calls Join but does not deal with the balancing criteria of red–black trees directly, such an implementation is usually called the "join-based" implementation. The complexity of each of union, intersection and difference is O\left(m \log \left({n\over m}+1\right)\right) for two red–black trees of sizes m and n(\ge m). This complexity is optimal in terms of the number of comparisons. More importantly, since the recursive calls to union, intersection or difference are independent of each other, they can be executed in parallel with a parallel depth O(\log m \log n). When m=1, the join-based implementation has the same computational directed acyclic graph (DAG) as single-element insertion and deletion if the root of the larger tree is used to split the smaller tree. == Parallel algorithms ==

Parallel algorithms

Parallel algorithms for constructing red–black trees from sorted lists of items can run in constant time or O(\log \log n) time, depending on the computer model, if the number of processors available is asymptotically proportional to the number n of items where n\to\infty. Fast search, insertion, and deletion parallel algorithms are also known. The join-based algorithms for red–black trees are parallel for bulk operations, including union, intersection, construction, filter, map-reduce, and so on. Parallel bulk operations Basic operations like insertion, removal or update can be parallelised by defining operations that process bulks of multiple elements. It is also possible to process bulks with several basic operations, for example bulks may contain elements to insert and also elements to remove from the tree. The algorithms for bulk operations aren't just applicable to the red–black tree, but can be adapted to other sorted sequence data structures also, like the 2–3 tree, 2–3–4 tree and (a,b)-tree. In the following different algorithms for bulk insert will be explained, but the same algorithms can also be applied to removal and update. Bulk insert is an operation that inserts each element of a sequence I into a tree T. Join-based This approach can be applied to every sorted sequence data structure that supports efficient join- and split-operations. The general idea is to split and in multiple parts and perform the insertions on these parts in parallel. • First the bulk of elements to insert must be sorted. • After that, the algorithm splits into k \in \mathbb{N}^+ parts \langle I_1, \cdots, I_k \rangle of about equal sizes. • Next the tree must be split into parts \langle T_1, \cdots, T_k \rangle in a way, so that the ranges of I_m and T_n overlap for corresponding parts only (m=n); in other words, thanks to the ordering, for every j \in \mathbb{N}^+ | \, 1 \leq j following constraints hold: • \text{last}(I_j) • \text{last}(T_j) • Now the algorithm inserts each element of I_j into T_j sequentially. This step must be performed for every , which can be done by up to processors in parallel. • Finally, the resulting trees will be joined to form the final result of the entire operation. Note that in Step 3 the constraints for splitting assure that in Step 5 the trees can be joined again and the resulting sequence is sorted. BulkInsert JoinBased InitialTree.svg|initial tree BulkInsert JoinBased SplitTree.svg|split I and T BulkInsert JoinBased SplitTreeInserted.svg|insert into the split T BulkInsert JoinBased JoinedTree.svg|join T The pseudo code shows a simple divide-and-conquer implementation of the join-based algorithm for bulk-insert. Both recursive calls can be executed in parallel. The join operation used here differs from the version explained in this article, instead join2 is used, which misses the second parameter k. bulkInsert(T, I, k): I.sort() bulklInsertRec(T, I, k) bulkInsertRec(T, I, k): if k = 1: forall e in I: T.insert(e) else m := ⌊size(I) / 2⌋ (T1, _, T2) := split(T, I[m]) bulkInsertRec(T1, I[0 .. m], ⌈k / 2⌉) || bulkInsertRec(T2, I[m + 1 .. size(I) - 1], ⌊k / 2⌋) T ← join2(T1, T2) Execution time Sorting is not considered in this analysis. :{k}\right) This can be improved by using parallel algorithms for splitting and joining. In this case the execution time is \in O\left(\log |T| + \frac{k} \log |T|\right). Work : Pipelining Another method of parallelizing bulk operations is to use a pipelining approach. This can be done by breaking the task of processing a basic operation up into a sequence of subtasks. For multiple basic operations the subtasks can be processed in parallel by assigning each subtask to a separate processor. • First the bulk of elements to insert must be sorted. • For each element in the algorithm locates the according insertion position in . This can be done in parallel for each element \in I since won't be mutated in this process. Now must be divided into subsequences according to the insertion position of each element. For example s_{n, \mathit{left}} is the subsequence of that contains the elements whose insertion position would be to the left of node . • The middle element m_{n, \mathit{dir}} of every subsequence s_{n, \mathit{dir}} will be inserted into as a new node n'. This can be done in parallel for each m_{n, \mathit{dir}} since by definition the insertion position of each m_{n, \mathit{dir}} is unique. If s_{n, \mathit{dir}} contains elements to the left or to the right of m_{n, \mathit{dir}}, those will be contained in a new set of subsequences as s_{n', \mathit{left}} or s_{n', \mathit{right}}. • Now possibly contains up to two consecutive red nodes at the end of the paths form the root to the leaves, which needs to be repaired. Note that, while repairing, the insertion position of elements \in S have to be updated, if the corresponding nodes are affected by rotations. • If two nodes have different nearest black ancestors, they can be repaired in parallel. Since at most four nodes can have the same nearest black ancestor, the nodes at the lowest level can be repaired in a constant number of parallel steps. • This step will be applied successively to the black levels above until is fully repaired. • The steps 3 to 5 will be repeated on the new subsequences until is empty. At this point every element \in I has been inserted. Each application of these steps is called a stage. Since the length of the subsequences in is \in O(|I|) and in every stage the subsequences are being cut in half, the number of stages is \in O(\log |I|). • Since all stages move up the black levels of the tree, they can be parallelised in a pipeline. Once a stage has finished processing one black level, the next stage is able to move up and continue at that level. BulkInsert Pipelining InitialTree.svg|Initial tree BulkInsert Pipelining InsertPositions.svg|Find insert positions BulkInsert Pipelining Stage1Insert.svg|Stage 1 inserts elements BulkInsert Pipelining Stage1Repair.svg|Stage 1 begins to repair nodes BulkInsert Pipelining Stage2Insert.svg|Stage 2 inserts elements BulkInsert Pipelining Stage2Repair.svg|Stage 2 begins to repair nodes BulkInsert Pipelining Stage3Insert.svg|Stage 3 inserts elements BulkInsert Pipelining Stage3Repair1.svg|Stage 3 begins to repair nodes BulkInsert Pipelining Stage3Repair2.svg|Stage 3 continues to repair nodes Execution time Sorting is not considered in this analysis. Also, |I| is assumed to be smaller than |T|, otherwise it would be more efficient to construct the resulting tree from scratch. : Work : == See also ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com