Binary search

Binary search works on sorted arrays. Binary search begins by comparing an element in the middle of the array with the target value. If the target value matches the element, its position in the array is returned. If the target value is less than the element, the search continues in the lower half of the array. If the target value is greater than the element, the search continues in the upper half of the array. By doing this, the algorithm eliminates the half in which the target value cannot lie in each iteration. Procedure Given an array A of n elements with values or records A_0,A_1,A_2,\ldots,A_{n-1}sorted such that A_0 \leq A_1 \leq A_2 \leq \cdots \leq A_{n-1}, and target value T, the following subroutine uses binary search to find the index of T in A. • Set L to 0 and R to n-1. • If L>R, the search terminates as unsuccessful. • Set m (the position of the middle element) to L plus the floor of \frac{R-L}{2}, which is the greatest integer less than or equal to \frac{R-L}{2}. • If A_m , set L to m+1 and go to step 2. • If A_m > T, set R to m-1 and go to step 2. • Now A_m = T, the search is done; return m. This iterative procedure keeps track of the search boundaries with the two variables L and R. The procedure may be expressed in pseudocode as follows, where the variable names and types remain the same as above, floor is the floor function, and unsuccessful refers to a specific value that conveys the failure of the search. function binary_search(A, n, T) is L := 0 R := n − 1 while L ≤ R do m := L + floor((R - L) / 2) if A[m] T then R := m − 1 else: return m return unsuccessful Alternatively, the algorithm may take the ceiling of \frac{R-L}{2}. This may change the result if the target value appears more than once in the array. Alternative procedure In the above procedure, the algorithm checks whether the middle element (m) is equal to the target (T) in every iteration. Some implementations leave out this check during each iteration. The algorithm would perform this check only when one element is left (when L=R). This results in a faster comparison loop, as one comparison is eliminated per iteration, while it requires only one more iteration on average. Hermann Bottenbruch published the first implementation to leave out this check in 1962. • Set L to 0 and R to n-1. • While L \neq R, • Set m (the position of the middle element) to L plus the ceiling of \frac{R-L}{2}, which is the least integer greater than or equal to \frac{R-L}{2}. • If A_m > T, set R to m-1. • Else, A_m \leq T; set L to m. • Now L=R, the search is done. If A_L=T, return L. Otherwise, the search terminates as unsuccessful. Where ceil is the ceiling function, the pseudocode for this version is: function binary_search_alternative(A, n, T) is L := 0 R := n − 1 while L != R do m := L + ceil((R - L) / 2) if A[m] > T then R := m − 1 else: L := m if A[L] = T then return L return unsuccessful Duplicate elements The procedure may return any index whose element is equal to the target value, even if there are duplicate elements in the array. For example, if the array to be searched was [1,2,3,4,4,5,6,7] and the target was 4, then it would be correct for the algorithm to either return the 4th (index 3) or 5th (index 4) element. The regular procedure would return the 4th element (index 3) in this case. It does not always return the first duplicate (consider [1,2,4,4,4,5,6,7] which still returns the 4th element). However, it is sometimes necessary to find the leftmost element or the rightmost element for a target value that is duplicated in the array. In the above example, the 4th element is the leftmost element of the value 4, while the 5th element is the rightmost element of the value 4. The alternative procedure above will always return the index of the rightmost element if such an element exists. Procedure for finding the leftmost element To find the leftmost element, the following procedure can be used: • Set L to 0 and R to n. • While L , • Set m (the position of the middle element) to L plus the floor of \frac{R-L}{2}, which is the greatest integer less than or equal to \frac{R-L}{2}. • If A_m , set L to m+1. • Else, A_m \geq T; set R to m. • Return L. If L and A_L = T, then A_L is the leftmost element that equals T. Even if T is not in the array, L is the rank of T in the array, or the number of elements in the array that are less than T. Where floor is the floor function, the pseudocode for this version is: function binary_search_leftmost(A, n, T): L := 0 R := n while L L to 0 and R to n. • While L , • Set m (the position of the middle element) to L plus the floor of \frac{R-L}{2}, which is the greatest integer less than or equal to \frac{R-L}{2}. • If A_m > T, set R to m. • Else, A_m \leq T; set L to m+1. • Return R - 1. If R > 0 and A_{R-1}=T, then A_{R-1} is the rightmost element that equals T. Even if T is not in the array, n-R is the number of elements in the array that are greater than T. Where floor is the floor function, the pseudocode for this version is: function binary_search_rightmost(A, n, T): L := 0 R := n while L T: R := m else: L := m + 1 return R - 1 Approximate matches The above procedure only performs exact matches, finding the position of a target value. However, it is trivial to extend binary search to perform approximate matches because binary search operates on sorted arrays. For example, binary search can be used to compute, for a given value, its rank (the number of smaller elements), predecessor (next-smallest element), successor (next-largest element), and nearest neighbor. Range queries seeking the number of elements between two values can be performed with two rank queries. • Rank queries can be performed with the procedure for finding the leftmost element. The number of elements less than the target value is returned by the procedure. • Predecessor queries can be performed with rank queries. If the rank of the target value is r, its predecessor is r-1. • For successor queries, the procedure for finding the rightmost element can be used. If the result of running the procedure for the target value is r, then the successor of the target value is r+1. • The nearest neighbor of the target value is either its predecessor or successor, whichever is closer. • Range queries are also straightforward. Once the ranks of the two values are known, the number of elements greater than or equal to the first value and less than the second is the difference of the two ranks. This count can be adjusted up or down by one according to whether the endpoints of the range should be considered to be part of the range and whether the array contains entries matching those endpoints. == Performance ==