Construction Since there are many possible ways to choose axis-aligned splitting planes, there are many different ways to construct
k-d trees. The canonical method of
k-d tree construction has the following constraints: An algorithm that builds a balanced to sort points has a worst-case complexity of O(kn\log(n)). This algorithm presorts
n points in each of
k dimensions using an O(n\log(n)) sort such as
Heapsort or
Mergesort prior to building the tree. It then maintains the order of these
k presorts during tree construction and thereby avoids finding the median at each level of subdivision.
Adding elements One adds a new point to a
k-d tree in the same way as one adds an element to any other
search tree. First, traverse the tree, starting from the root and moving to either the left or the right child depending on whether the point to be inserted is on the "left" or "right" side of the splitting plane. Once you get to the node under which the child should be located, add the new point as either the left or right child of the leaf node, again depending on which side of the node's splitting plane contains the new node. Adding points in this manner can cause the tree to become unbalanced, leading to decreased tree performance. The rate of tree performance degradation is dependent upon the spatial distribution of tree points being added, and the number of points added in relation to the tree size. If a tree becomes too unbalanced, it may need to be re-balanced to restore the performance of queries that rely on the tree balancing, such as nearest neighbour searching.
Removing elements To remove a point from an existing
k-d tree without breaking the invariant, the easiest way is to form the set of all nodes and leaves from the children of the target node and recreate that part of the tree. First, find the node R that contains the point to be removed. For the base case, where R is a leaf node, no replacement is required. For the general case, find a replacement point, say p, from the subtree rooted at R. Replace the point stored at R with p. Then, recursively remove p. For finding a replacement point, if R discriminates on x (say) and R has a right child, find the point with the minimum x value from the subtree rooted at the right child. Otherwise, find the point with the maximum x value from the subtree rooted at the left child.
Balancing Balancing a
k-d tree requires care because
k-d trees are sorted in multiple dimensions, so the
tree-rotation technique cannot be used to balance them as this may break the invariant. Several variants of balanced
k-d trees exist. They include divided
k-d tree, pseudo
k-d tree,
K-D-B-tree, hB-tree and
Bkd-tree. Many of these variants are
adaptive k-d trees.
Nearest neighbour search The
nearest neighbour search (NN) algorithm aims to find the point in the tree that is nearest to a given input point. This search can be done efficiently by using the tree properties to quickly eliminate large portions of the search space. Searching for a nearest neighbour in a
k-d tree proceeds as follows: • Starting with the root node, the algorithm moves down the tree recursively, in the same way that it would if the search point were being inserted (i.e. it goes left or right depending on whether the point is lesser than or greater than the current node in the split dimension). • Once the algorithm reaches a leaf node, it checks the node point and if the distance is better than the "current best", that node point is saved as the "current best". • The algorithm unwinds the recursion of the tree, performing the following steps at each node: • If the current node is closer than the current best, then it becomes the current best. • The algorithm checks whether there could be any points on the other side of the splitting plane that are closer to the search point than the current best. In concept, this is done by intersecting the splitting
hyperplane with a
hypersphere around the search point that has a radius equal to the current nearest distance. Since the hyperplanes are all axis-aligned this is implemented as a simple comparison to see whether the distance between the splitting coordinate of the search point and current node is less than the distance (overall coordinates) from the search point to the current best. • If the hypersphere crosses the plane, there could be nearer points on the other side of the plane, so the algorithm must move down the other branch of the tree from the current node looking for closer points, following the same recursive process as the entire search. • If the hypersphere doesn't intersect the splitting plane, then the algorithm continues walking up the tree, and the entire branch on the other side of that node is eliminated. • When the algorithm finishes this process for the root node, then the search is complete. Generally the algorithm uses squared distances for comparison to avoid computing square roots. Additionally, it can save computation by holding the squared current best distance in a variable for comparison. The algorithm can be extended in several ways by simple modifications. It can provide the
k nearest neighbours to a point by maintaining
k current bests instead of just one. A branch is only eliminated when
k points have been found and the branch cannot have points closer than any of the
k current bests. It can also be converted to an approximation algorithm to run faster. For example, approximate nearest neighbour searching can be achieved by simply setting an upper bound on the number of points to examine in the tree or by interrupting the search process based upon a real time clock (which may be more appropriate in hardware implementations). The nearest neighbour for points that are already in the tree can be achieved by not updating the refinement for nodes that give zero distance. As a result, this has the downside of discarding points that are not unique but are co-located with the original search point. Approximate nearest neighbour is useful in real-time applications such as robotics due to the significant speed increase gained by not searching for the best point exhaustively. One of its implementations is
best-bin-first search.
Range search A range search searches for ranges of parameters. For example, if a tree is storing values corresponding to income and age, then a range search might be something like looking for all members of the tree which have an age between 20 and 50 years and an income between 50,000 and 80,000. Since k-d trees divide the range of a domain in half at each level of the tree, they are useful for performing range searches. Analyses of binary search trees has found that the worst case time for range search in a
k-dimensional
k-d tree containing
n nodes is given by the following equation. :t_\text{worst} = O\left(k \cdot n^{1-\frac{1}{k}}\right) ==Degradation in performance with high-dimensional data==