No free lunch in search and optimization

In computational complexity and optimization the no free lunch theorem is a result that states that for certain types of mathematical problems, the computational cost of finding a solution, averaged over all problems in the class, is the same for any solution method. The name alludes to the saying "no such thing as a free lunch", that is, no method offers a "short cut". This is under the assumption that the search space is a probability density function. It does not apply to the case where the search space has underlying structure that can be exploited more efficiently than random search or even has closed-form solutions that can be determined without search at all. For such probabilistic assumptions, the outputs of all procedures solving a particular type of problem are statistically identical. A colourful way of describing such a circumstance, introduced by David Wolpert and William G. Macready in connection with the problems of search and optimization, is to say that there is no free lunch. Wolpert had previously derived no free lunch theorems for machine learning. Before Wolpert's article was published, Cullen Schaffer independently proved a restricted version of one of Wolpert's theorems and used it to critique the current state of machine learning research on the problem of induction.

Overview

Some computational problems are solved by searching for good solutions in a space of candidate solutions. A description of how to repeatedly select candidate solutions for evaluation is called a search algorithm. On a particular problem, different search algorithms may obtain different results, but over all problems, they are indistinguishable. It follows that if an algorithm achieves superior results on some problems, it must pay with inferiority on other problems. In this sense there is no free lunch in search. The "no free lunch" results indicate that matching algorithms to problems gives higher average performance than does applying a fixed algorithm to all. Igel and Toussaint and English have established a general condition under which there is no free lunch. While it is physically possible, it does not hold precisely. Droste, Jansen, and Wegener have proved a theorem they interpret as indicating that there is "(almost) no free lunch" in practice. To make matters more concrete, consider an optimization practitioner confronted with a problem. Given some knowledge of how the problem arose, the practitioner may be able to exploit the knowledge in selection of an algorithm that will perform well in solving the problem. If the practitioner does not understand how to exploit the knowledge, or simply has no knowledge, then he or she faces the question of whether some algorithm generally outperforms others on real-world problems. The authors of the "(almost) no free lunch" theorem say that the answer is essentially no, but admit some reservations as to whether the theorem addresses practice. ==Theorems==

Theorems

A "problem" is, more formally, an objective function that associates candidate solutions with goodness values. A search algorithm takes an objective function as input and evaluates candidate solutions one-by-one. The output of the algorithm is the sequence of observed goodness values. Wolpert and Macready stipulate that an algorithm never reevaluates a candidate solution, and that algorithm performance is measured on outputs. For instance, if each candidate solution is encoded as a sequence of 300 0's and 1's, and the goodness values are 0 and 1, then most objective functions have Kolmogorov complexity of at least 2300 bits, and this is greater than Lloyd's bound of 1090 ≈ 2299 bits. It follows that the original "no free lunch" theorem does not apply to what can be stored in a physical computer; instead the so-called "tightened" no free lunch theorems need to be applied. It has also been shown that NFL results apply to incomputable functions. ==Formal synopsis==

Formal synopsis

Y^X is the set of all objective functions f:X→Y, where X is a finite solution space and Y is a finite poset. The set of all permutations of X is J. A random variable F is distributed on Y^X. For all j in J, F o j is a random variable distributed on Y^X, with P(F o j = f) = P(F = f o j−1) for all f in Y^X. Let a(f) denote the output of search algorithm a on input f. If a(F) and b(F) are identically distributed for all search algorithms a and b, then F has an NFL distribution. This condition holds if and only if F and F o j are identically distributed for all j in J. Set-theoretic NFL theorems have recently been generalized to arbitrary cardinality X and Y. ==Origin==

Origin

Wolpert and Macready give two principal NFL theorems, the first regarding objective functions that do not change while search is in progress, and the second regarding objective functions that may change. :Theorem 1: For any pair of algorithms a1 and a2 ::\sum_f P(d_m^y | f, m, a_1) = \sum_f P(d_m^y | f, m, a_2), where d_m^y denotes the ordered set of size m of the cost values y \in Y associated to input values x \in X, f:X \rightarrow Y is the function being optimized and P(d_m^y | f, m, a) is the conditional probability of obtaining a given sequence of cost values from algorithm a run m times on function f. In essence, this says that when all functions f are equally likely, the probability of observing an arbitrary sequence of m values in the course of search does not depend upon the search algorithm. The second theorem establishes a "more subtle" NFL result for time-varying objective functions. ==Interpretations of results==

Interpretations of results

A conventional, but not entirely accurate, interpretation of the NFL results is that "a general-purpose universal optimization strategy is theoretically impossible, and the only way one strategy can outperform another is if it is specialized to the specific problem under consideration". Several comments are in order: • A general-purpose almost-universal optimizer exists theoretically. Each search algorithm performs well on almost all objective functions.. ==Coevolution==

Coevolution

Wolpert and Macready have proved that there are free lunches in coevolutionary optimization. Their analysis "covers 'self-play' problems. In these problems, the set of players work together to produce a champion, who then engages one or more antagonists in a subsequent multiplayer game." That is, the objective is to obtain a good player, but without an objective function. The goodness of each player (candidate solution) is assessed by observing how well it plays against others. An algorithm attempts to use players and their quality of play to obtain better players. The player deemed best of all by the algorithm is the champion. Wolpert and Macready have demonstrated that some coevolutionary algorithms are generally superior to other algorithms in quality of champions obtained. Generating a champion through self-play is of interest in evolutionary computation and game theory. The results are inapplicable to coevolution of biological species, which does not yield champions. ==See also==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com