The ingredients of a stochastic game are: a finite set of players I; a state space S (either a finite set or a
measurable space (S,{\mathcal S})); for each player i\in I, an action set A^i (either a finite set or a measurable space (A^i,{\mathcal A}^i)); a transition probability P from S\times A, where A=\times_{i\in I}A^i is the action profiles, to S, where P(S \mid s, a) is the probability that the next state is in S given the current state s and the current action profile a; and a payoff function g from S\times A to R^I, where the i-th coordinate of g, g^i, is the payoff to player i as a function of the state s and the action profile a. The game starts at some initial state s_1. At stage t, players first observe s_t, then simultaneously choose actions a^i_t\in A^i, then observe the action profile a_t=(a^i_t)_i, and then nature selects s_{t+1} according to the probability P(\cdot\mid s_t,a_t). A play of the stochastic game, s_1,a_1,\ldots,s_t,a_t,\ldots, defines a stream of payoffs g_1,g_2,\ldots, where g_t=g(s_t,a_t). The discounted game \Gamma_\lambda with discount factor \lambda (0) is the game where the payoff to player i is \lambda \sum_{t=1}^{\infty}(1-\lambda)^{t-1}g^i_t. The n-stage game is the game where the payoff to player i is \bar{g}^i_n:=\frac1n\sum_{t=1}^ng^i_t. The value v_n(s_1), respectively v_{\lambda}(s_1), of a two-person zero-sum stochastic game \Gamma_n, respectively \Gamma_{\lambda}, with finitely many states and actions exists, and
Truman Bewley and
Elon Kohlberg (1976) proved that v_n(s_1) converges to a limit as n goes to infinity and that v_{\lambda}(s_1) converges to the same limit as \lambda goes to 0. The "undiscounted" game \Gamma_\infty is the game where the payoff to player i is the "limit" of the averages of the stage payoffs. Some precautions are needed in defining the value of a two-person zero-sum \Gamma_{\infty} and in defining equilibrium payoffs of a non-zero-sum \Gamma_{\infty}. The uniform value v_{\infty} of a two-person zero-sum stochastic game \Gamma_\infty exists if for every \varepsilon>0 there is a positive integer N and a strategy pair \sigma_{\varepsilon} of player 1 and \tau_{\varepsilon} of player 2 such that for every \sigma and \tau and every n\geq N the expectation of \bar{g}^i_n with respect to the probability on plays defined by \sigma_{\varepsilon} and \tau is at least v_{\infty} -\varepsilon , and the expectation of \bar{g}^i_n with respect to the probability on plays defined by \sigma and \tau_{\varepsilon} is at most v_{\infty} +\varepsilon .
Jean-François Mertens and
Abraham Neyman (1981) proved that every two-person zero-sum stochastic game with finitely many states and actions has a uniform value. If there is a finite number of players and the action sets and the set of states are finite, then a stochastic game with a finite number of stages always has a
Nash equilibrium. The same is true for a game with infinitely many stages if the total payoff is the discounted sum. The non-zero-sum stochastic game \Gamma_\infty has a uniform equilibrium payoff v_{\infty} if for every \varepsilon>0 there is a positive integer N and a strategy profile \sigma such that for every unilateral deviation by a player i, i.e., a strategy profile \tau with \sigma^j=\tau^j for all j\neq i, and every n\geq N the expectation of \bar{g}^i_n with respect to the probability on plays defined by \sigma is at least v^i_{\infty} -\varepsilon , and the expectation of \bar{g}^i_n with respect to the probability on plays defined by \tau is at most v^i_{\infty} +\varepsilon .
Nicolas Vieille has shown that all two-person stochastic games with finite state and action spaces have a uniform equilibrium payoff. The non-zero-sum stochastic game \Gamma_\infty has a limiting-average equilibrium payoff v_{\infty} if for every \varepsilon>0 there is a strategy profile \sigma such that for every unilateral deviation by a player i, the expectation of the limit inferior of the averages of the stage payoffs with respect to the probability on plays defined by \sigma is at least v^i_{\infty} -\varepsilon , and the expectation of the limit superior of the averages of the stage payoffs with respect to the probability on plays defined by \tau is at most v^i_{\infty} +\varepsilon .
Jean-François Mertens and
Abraham Neyman (1981) proves that every two-person zero-sum stochastic game with finitely many states and actions has a limiting-average value, The resulting
stochastic Bayesian game model is solved via a recursive combination of the
Bayesian Nash equilibrium equation and the
Bellman optimality equation. == Stopping games ==