If two players play the prisoner's dilemma more than once in succession, remember their opponent's previous actions, and are allowed to change their strategy accordingly, the game is called the iterated prisoner's dilemma. In addition to the general form above, the iterative version also requires that , to prevent alternating cooperation and defection giving a greater reward than mutual cooperation. The iterated prisoner's dilemma is fundamental to some theories of human cooperation and trust. Assuming that the game effectively models transactions between two people that require trust, cooperative behavior in populations can be modeled by a multi-player iterated version of the game. In 1975,
Grofman and
Pool estimated the count of scholarly articles devoted to it at over 2,000. The iterated prisoner's dilemma is also called the "
peace-war game". Several variants of the iterated game have been studied, including a version in which players have an exit option after each round.
General strategy If the iterated prisoner's dilemma is played a finite number of times and both players know this, then the dominant strategy and Nash equilibrium is to defect in all rounds. The proof is
inductive: one might as well defect on the last turn, since the opponent will not have a chance to later retaliate. Therefore, both will defect on the last turn. Thus, the player might as well defect on the second-to-last turn, since the opponent will defect on the last no matter what is done, and so on. The same applies if the game length is unknown but has a known upper limit. For
cooperation to emerge between rational players, the number of rounds must be unknown or infinite. In that case, "always defect" may no longer be a dominant strategy. As shown by
Robert Aumann in a 1959 paper, rational players repeatedly interacting for indefinitely long games can sustain cooperation. Specifically, a player may be less willing to cooperate if their counterpart did not cooperate many times, which causes disappointment. Conversely, as time elapses, the likelihood of cooperation tends to rise, owing to the establishment of a "tacit agreement" among participating players. In experimental situations, cooperation can occur even when both participants know how many iterations will be played. According to a 2019 experimental study in the
American Economic Review that tested what strategies real-life subjects used in iterated prisoner's dilemma situations with perfect monitoring, the majority of chosen strategies were always to defect,
tit-for-tat, and
grim trigger. Which strategy the subjects chose depended on the parameters of the game.
Axelrod's tournament and successful strategy conditions Interest in the iterated prisoner's dilemma was kindled by
Robert Axelrod in his 1984 book
The Evolution of Cooperation, in which he reports on a tournament that he organized of the
N-step prisoner's dilemma (with
N fixed) in which participants have to choose their strategy repeatedly and remember their previous encounters. Axelrod invited academic colleagues from around the world to devise computer strategies to compete in an iterated prisoner's dilemma tournament. The programs that were entered varied widely in algorithmic complexity, initial hostility, capacity for forgiveness, and so forth. Axelrod discovered that when these encounters were repeated over a long period of time with many players, each with different strategies, greedy strategies tended to do very poorly in the long run while more
altruistic strategies did better, as judged purely by self-interest. He used this to show a possible mechanism for the evolution of altruistic behavior from mechanisms that are initially purely selfish, by
natural selection. The winning
deterministic strategy was
tit for tat, developed and entered into the tournament by
Anatol Rapoport. It was the simplest of any program entered, containing only four lines of
BASIC, and won the contest. The strategy is simply to cooperate on the first iteration of the game; after that, the player does what his or her opponent did on the previous move. Depending on the situation, a slightly better strategy can be "tit for tat with forgiveness": when the opponent defects, on the next move, the player sometimes cooperates anyway, with a small probability (around 1–5%, depending on the lineup of opponents). This allows for occasional recovery from getting trapped in a cycle of defections. After analyzing the top-scoring strategies, Axelrod stated several conditions necessary for a strategy to succeed: •
Nice: The strategy will not be the first to defect (this is sometimes referred to as an "optimistic" algorithm), i.e., it will not "cheat" on its opponent for purely self-interested reasons first. Almost all the top-scoring strategies were nice. •
Retaliating: The strategy must sometimes retaliate. An example of a non-retaliating strategy is Always Cooperate, a very bad choice that will frequently be exploited by "nasty" strategies. •
Forgiving: Successful strategies must be forgiving. Though players will retaliate, they will cooperate again if the opponent does not continue to defect. This can stop long runs of revenge and counter-revenge, maximizing points. •
Non-envious: The strategy must not strive to score more than the opponent. In contrast to the one-time prisoner's dilemma game, the optimal strategy in the iterated prisoner's dilemma depends upon the strategies of likely opponents, how they will react to defections and cooperation, and the discounting of future payoffs. For example, if a population consists entirely of players who always defect, except for one who follows the tit-for-tat strategy, that person is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy is to defect every time. More generally, given a population with a certain percentage of always-defectors with the rest being tit-for-tat players, the optimal strategy depends on the percentage and number of iterations played.
Other strategies Deriving the optimal strategy is generally done in two ways: •
Bayesian Nash equilibrium: If the statistical distribution of opposing strategies can be determined an optimal counter-strategy can be derived analytically. •
Monte Carlo simulations of populations have been made, where individuals with low scores die off, and those with high scores reproduce (a
genetic algorithm for finding an optimal strategy). The mix of algorithms in the final population generally depends on the mix in the initial population. The introduction of mutation (random variation during reproduction) lessens the dependency on the initial population; empirical experiments with such systems tend to produce tit-for-tat players, but no analytic proof exists that this will always occur. In the strategy called
win-stay, lose-switch, faced with a failure to cooperate, the player switches strategy the next turn. In certain circumstances, Pavlov beats all other strategies by giving preferential treatment to co-players using a similar strategy. Although tit-for-tat is considered the most
robust basic strategy, a team from
Southampton University in England introduced a more successful strategy at the 20th-anniversary iterated prisoner's dilemma competition. It relied on collusion between programs to achieve the highest number of points for a single program. The university submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start. Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a non-Southampton player, it would continuously defect in an attempt to minimize the competing program's score. As a result, the 2004 Prisoners' Dilemma Tournament results show
University of Southampton's strategies in the first three places (and a number of positions towards the bottom), despite having fewer wins and many more losses than the GRIM strategy. The Southampton strategy takes advantage of the fact that multiple entries were allowed in this particular competition and that a team's performance was measured by that of the highest-scoring player (meaning that the use of self-sacrificing players was a form of
minmaxing). Because of this new rule, this competition also has little theoretical significance when analyzing single-agent strategies as compared to Axelrod's seminal tournament. But it provided a basis for analyzing how to achieve cooperative strategies in multi-agent frameworks, especially in the presence of noise. Long before this new-rules tournament was played,
Richard Dawkins, in his book
The Selfish Gene, pointed out the possibility of such strategies winning if multiple entries were allowed, but wrote that Axelrod would most likely not have allowed them if they had been submitted. Such strategies also circumvent the rule against communication between players: the Southampton programs' "ten-move dance" allowed them to recognize one another, reinforcing how valuable communication can be in shifting the balance of the game. Even without implicit collusion between
software strategies, tit-for-tat is not always the absolute winner of any given tournament; more precisely, its long-run results over a series of tournaments outperform its rivals, but this does not mean it is the most successful in the short term. The same applies to tit-for-tat with forgiveness and other optimal strategies. This can also be illustrated using the Darwinian
ESS simulation. In such a simulation, tit-for-tat will almost always come to dominate, though nasty strategies will drift in and out of the population because a tit-for-tat population is penetrable by non-retaliating nice strategies, which in turn are easy prey for the nasty strategies. Dawkins showed that here, no static mix of strategies forms a stable equilibrium, and the system will always oscillate between bounds.
Stochastic iterated prisoner's dilemma In a stochastic iterated prisoner's dilemma game, strategies are specified in terms of "cooperation probabilities". In an encounter between player
X and player
Y,
Xs strategy is specified by a set of probabilities
P of cooperating with
Y.
P is a function of the outcomes of their previous encounters or some subset thereof. If
P is a function of only their most recent
n encounters, it is called a "memory-n" strategy. A memory-1 strategy is then specified by four cooperation probabilities: P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}, where
Pcd is the probability that
X will cooperate in the present encounter given that the previous encounter was characterized by
X cooperating and
Y defecting. If each of the probabilities are either 1 or 0, the strategy is called deterministic. An example of a deterministic strategy is the tit-for-tat strategy written as P=\{1,0,1,0\}, in which
X responds as
Y did in the previous encounter. Another is the win-stay, lose switch strategy written as P=\{1,0,0,1\}. It has been shown that for any memory-n strategy there is a corresponding memory-1 strategy that gives the same statistical results, so that only memory-1 strategies need be considered. Theory and simulations confirm that beyond a critical population size, ZD extortion loses out in evolutionary competition against more cooperative strategies, and as a result, the average payoff in the population increases when the population is larger. In addition, there are some cases in which extortioners may even catalyze cooperation by helping to break out of a face-off between uniform defectors and
win–stay, lose–switch agents. While extortionary ZD strategies are not stable in large populations, another ZD class called "generous" strategies is both stable and robust. When the population is not too small, these strategies can supplant any other ZD strategy and even perform well against a broad array of generic strategies for iterated prisoner's dilemma, including win–stay, lose–switch. This was proven specifically for the
donation game by Alexander Stewart and Joshua Plotkin in 2013. Generous strategies will cooperate with other cooperative players, and in the face of defection, the generous player loses more utility than its rival. Generous strategies are the intersection of ZD strategies and so-called "good" strategies, which were defined by Ethan Akin to be those for which the player responds to past mutual cooperation with future cooperation and splits expected payoffs equally if he receives at least the cooperative expected payoff. Among good strategies, the generous (ZD) subset performs well when the population is not too small. If the population is very small, defection strategies tend to dominate. found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoner's dilemma. In a continuous prisoner's dilemma, if a population starts off in a non-cooperative equilibrium, players who are only marginally more cooperative than non-cooperators get little benefit from
assorting with one another. By contrast, in a discrete prisoner's dilemma, tit-for-tat cooperators get a big payoff boost from assorting with one another in a non-cooperative equilibrium, relative to non-cooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoner's dilemma may help explain why real-life examples of tit-for-tat-like cooperation are extremely rare even though tit-for-tat seems robust in theoretical models. ==Real-life examples==