Fractional cascading

As a simple example of fractional cascading, consider the following problem. We are given as input a collection of k ordered lists L_i of numbers, such that the total length \sum_i |L_i| of all lists is n, and must process them so that we can perform binary searches for a query value q in each of the k lists. For instance, with k=4 and n=17, :L_1 = 24, 64, 65, 80, 93 :L_2 = 23, 25, 26 :L_3 = 13, 44, 62, 66 :L_4 = 11, 35, 46, 79, 81 The simplest solution to this searching problem is just to store each list separately. If we do so, the space requirement is O(n), but the time to perform a query is O\bigl(k\log(n/k)\bigr), as we must perform a separate binary search in each of k lists. The worst case for querying this structure occurs when each of the k lists has equal size n/k, so each of the k binary searches involved in a query takes time O\bigl(\log(n/k)\bigr). A second solution allows faster queries at the expense of more space: we may merge all the k lists into a single big list L, and associate with each item x of L a list of the results of searching for x in each of the smaller lists L_i. If we describe an element of this merged list as x[a,b,c,d] where x is the numerical value and a, b, c, and d are the positions (the first number has position 0) of the next element at least as large as x in each of the original input lists (or the position after the end of the list if no such element exists), then we would have :L = 11[0,0,0,0], 13[0,0,0,1], 23[0,0,1,1], 24[0,1,1,1], 25[1,1,1,1], 26[1,2,1,1], ::35[1,3,1,1], 44[1,3,1,2], 46[1,3,2,2], 62[1,3,2,3], 64[1,3,3,3], 65[2,3,3,3], ::66[3,3,3,3], 79[3,3,4,3], 80[3,3,4,4], 81[4,3,4,4], 93[4,3,4,5] This merged solution allows a query in time O(k+\log n): simply search for q in L and then report the results stored at the item x found by this search. For instance, if q=50, searching for q in L finds the item 62[1,3,2,3], from which we return the results L_1[1]=64, L_2[3] (a flag value indicating that q is past the end of L_2), L_3[2]=62, and L_4[3]=79. However, this solution pays a high penalty in space complexity: it uses space O(kn) as each of the n items in L must store a list of k search results. Fractional cascading allows this same searching problem to be solved with time and space bounds meeting the best of both worlds: query time O(k+\log n), and space O(n). The fractional cascading solution is to store a new sequence of lists M_i. The final list in this sequence, M_k, is equal to L_k; each earlier list M_i is formed by merging L_i with every second item from M_{i+1}. With each item x in this merged list, we store two numbers: the position resulting from searching for x in L_i and the position resulting from searching for x in M_{i+1}. For the data above, this would give us the following lists: :M_1 = 24[0, 1], 25[1, 1], 35[1, 3], 64[1, 5], 65[2, 5], 79[3, 5], 80[3, 6], 93[4, 6] :M_2 = 23[0, 1], 25[1, 1], 26[2, 1], 35[3, 1], 62[3, 3], 79[3, 5] :M_3 = 13[0, 1], 35[1, 1], 44[1, 2], 62[2, 3], 66[3, 3], 79[4, 3] :M_4 = 11[0, 0], 35[1, 0], 46[2, 0], 79[3, 0], 81[4, 0] Suppose we wish to perform a query in this structure, for q=50. We first do a standard binary search for q in M_1, finding the value 64[1,5]. The "1" in 64[1,5], tells us that the search for q in L_1 should return L_1[1]=64. The "5" in 64[1,5] tells us that the approximate location of q in M_2 is position 5. More precisely, binary searching for q in M_2 would return either the value 79[3,5] at position 5, or the value 62[3,3] one place earlier. By comparing q to 62, and observing that it is smaller, we determine that the correct search result in M_2 is 62[3,3]. The first "3" in 62[3,3] tells us that the search for q in L_2 should return L_2[3], a flag value meaning that q is past the end of list L_2. The second "3" in 62[3,3] tells us that the approximate location of q in M_3 is position 3. More precisely, binary searching for q in M_3 would return either the value 62[2,3] at position 3, or the value 44[1,2] one place earlier. A comparison of q with the smaller value 44 shows us that the correct search result in M_3 is 62[2,3]. The "2" in 62[2,3] tells us that the search for q in L_3 should return L_3[2]=62, and the "3" in 62[2,3] tells us that the result of searching for q in M_4 is either 79[3,0] at position 3 or 46[2,0] at position 2; comparing q with 46 shows that the correct result is 79[3,0] and that the result of searching for q in L_4 is L_4[3]=79. Thus, we have found q in each of our four lists, by doing a binary search in the single list M_1 followed by a single comparison in each of the successive lists. More generally, for any data structure of this type, we perform a query by doing a binary search for q in M_1, and determining from the resulting value the position of q in L_1. Then, for each i>1, we use the known position of q in M_i to find its position in M_{i+1}. The value associated with the position of q in M_i points to a position in M_{i+1} that is either the correct result of the binary search for q in M_{i+1} or is a single step away from that correct result, so stepping from i to i+1 requires only a single comparison. Thus, the total time for a query is O(k+\log n) In our example, the fractionally cascaded lists have a total of 25 elements, less than twice that of the original input. In general, the size of M_i in this data structure is at most |L_i|+\frac12|L_{i+1}|+\frac14|L_{i+2}|+\cdots+\frac1{2^j}|L_{i+j}|+\cdots, as may easily be proven by induction. Therefore, the total size of the data structure is at most \sum|M_i|=\sum |L_i| \left(1+\frac12+\frac14+\cdots\right) \leq 2n=O(n), as may be seen by regrouping the contributions to the total size coming from the same input list L_i together with each other. ==The general problem==