Consider the task to infer which object, A or B, has a higher value on a numerical criterion. As an example imagine someone having to judge whether the German city of Cologne has a larger population than the other German city of Stuttgart. This judgment or inference has to be based on information provided by binary cues, like "Is the city a state capital?". From a formal point of view, the task is a categorization: A pair \left( A, B\right) is to be categorized as X_A > X_B or X_B > X_A (where X denotes the criterion), based on cue information. Cues are binary; this means they assume two values and can be modeled, for instance, as having the values 0 and 1 (for and ). They are ranked according to their
cue validity, defined as the proportion of correct comparisons among the pairs A and B, for which it has different values, i.e., for which it discriminates between A and B. Take-the-best analyses each cue, one after the other, according to the ranking by validity and stopping the first time a cue discriminates between the items and concluding that the item with the larger value has also a larger value on the criterion. The matrix of all objects of the reference class, from which A and B have been taken, and of the cue values which describe these objects constitutes a so-called environment. Gigerenzer and Goldstein, who introduced take-the-best considered, as a walk-through example, precisely pairs of German cities, yet only those with more than 100,000 inhabitants. The comparison task for a given pair \left( A, B\right) of German cities in the reference class, consisted in establishing which one has a larger population, based on nine cues. Cues were binary-valued, such as whether the city is a state capital or whether it has a soccer team in the national league. The cue values could be modeled by 1s (for ) and 0s (for ) so that each city could be identified with its "cue profile", i.e., a vector of 1s and 0s, ordered according to the ranking of cues. The question was: how can one infer which of two objects, for example, city A with cue profile \left(100101010\right) and city B with cue profile \left(100010101\right), scores higher on the established criterion, i.e., population size? The take-the-best heuristic simply compares the profiles lexicographically, just as numbers written in base two are compared: the first cue value is 1 for both, which means that the first cue does not discriminate between A and B. The second cue value is 0 for both, again with no discrimination. The same happens for the third cue value, while the fourth cue value is 1 for A and 0 for B, implying that A is judged as having a higher value on the criterion. In other words, X_A > X_B if and only if \left(100101010\right) > \left(100010101\right) . Mathematically this means that the cues found for the comparison allow a
quasi-order isomorphism between the objects compared on the criterion, in this case cities with their populations, and their corresponding binary vectors. Here
quasi means that the isomorphism is, in general, not perfect, because the set of cues is not perfect. What is surprising is that this simple heuristic has a great performance compared with other strategies. One obvious measure for establishing the performance of an inference mechanism is determined by the percentage of correct judgements. Furthermore, what matters most is not just the performance of the heuristic when fitting known data, but when generalizing from a known training set to new items. Czerlinski, Goldstein and Gigerenzer compared several strategies with take-the-best: a simple tallying, or unit weight model (also called ''Dawes' rule
in that literature), a weighted linear model on the cues weighted by their validities (also called Franklin's rule'' in that literature),
linear regression, and minimalist. Their results show the robustness of take-the-best in generalization. For example, consider the task of selecting the bigger city of two cities when, • models are fit to a
data set of 83 German cities, and • models select the bigger of a pair of cities for all 83\times82/2 pairs of cities. The percent correct was roughly 74% for regression, take-the-best, unit weight linear. More specifically, the scores were 74.3%, 74.2%, and 74.1%, so regression won by a small margin. However, the paper also considered generalization (also known as out-of-sample prediction): • models are fit to a data set of a randomly-selected half of 83 German cities, and • models select the bigger of a pair of cities drawn from the
other half of cities. In this case, when 10,000 different random splits were used, regression had on average 71.9% correct, Take-the-best had 72.2% correct, and unit with linear had 71.4% correct. The take-the-best heuristic was more accurate than regression in this case. ==See also==