Pairwise comparisons form the basis of the Elo rating methodology. Elo made references to the papers of Good, David, Trawinski and David, and Buhlman and Huber.
Mathematical details Performance is not measured absolutely; it is inferred from wins, losses, and draws against other players. Players' ratings depend on the ratings of their opponents and the results scored against them. The difference in rating between two players determines an estimate for the expected score between them. Both the average and the spread of ratings can be arbitrarily chosen. The USCF initially aimed for an average club player to have a rating of 1500 and Elo suggested scaling ratings so that a difference of 200 rating points in chess would mean that the stronger player has an
expected score of approximately 0.75. A player's
expected score is their probability of winning plus half their probability of drawing. Thus, an expected score of 0.75 could represent a 75% chance of winning, 25% chance of losing, and 0% chance of drawing. On the other extreme it could represent a 50% chance of winning, 0% chance of losing, and 50% chance of drawing. The probability of drawing, as opposed to having a decisive result, is not specified in the Elo system. Instead, a draw is considered half a win and half a loss. In practice, since the true strength of each player is unknown, the expected scores are calculated using the player's current ratings as follows. If player has a rating of \, R_\mathsf{A} \, and player a rating of \, R_\mathsf{B} \,, the exact formula (using the
logistic curve with
base 10 and scale factor 400) for the expected score of player is : E_\mathsf{A} = \frac 1 {1 + 10^{(R_\mathsf{B} - R_\mathsf{A})/400}} ~. Similarly, the expected score for player is : E_\mathsf{B} = \frac 1 {1 + 10^{(R_\mathsf{A} - R_\mathsf{B})/400}} ~. This could also be expressed by : E_\mathsf{A} = \frac{ Q_\mathsf{A} }{ Q_\mathsf{A} + Q_\mathsf{B} } and : E_\mathsf{B} = \frac{ Q_\mathsf{B} }{Q_\mathsf{A} + Q_\mathsf{B} } ~, where \; Q_\mathsf{A} = 10^{R_\mathsf{A}/400} \;, and \; Q_\mathsf{B} = 10^{R_\mathsf{B}/400} ~. Note that in the latter case, the same denominator applies to both expressions, and it is plain that \; E_\mathsf{A} + E_\mathsf{B} = 1 ~. This means that by studying only the numerators, we find out that the expected score for player is \; Q_\mathsf{A}/Q_\mathsf{B} \; times the expected score for player . We can achieve this algebraically by subtracting 1 from the reciprocal of E_\mathsf{B} before multiplying E_\mathsf{B}. It then follows that for each 400 rating points of advantage over the opponent, the expected score is magnified ten times in comparison to the opponent's expected score. When a player's actual tournament scores exceed their expected scores, the Elo system takes this as evidence that player's rating is too low, and needs to be adjusted upward. Similarly, when a player's actual tournament scores fall short of their expected scores, that player's rating is adjusted downward. Elo's original suggestion, which is still widely used, was a simple linear adjustment proportional to the amount by which a player over-performed or under-performed their expected score. The maximum possible adjustment per game, called the K-factor, was set at \; K = 16 \; for masters and \; K = 32 \; for weaker players. Suppose player (again with rating R_\mathsf{A}) was expected to score \, E_\mathsf{A} \, points but actually scored \, S_\mathsf{A} \, points. The formula for updating that player's rating is :R_\mathsf{A}' = R_\mathsf{A} + K \cdot (S_\mathsf{A} - E_\mathsf{A}) ~. FIDE also uses an approximation to the logistic distribution.
Most accurate K-factor The second major concern is the correct "-factor" used. The chess statistician
Jeff Sonas believes that the original \; K = 10 \; value (for players rated above 2400) is inaccurate in Elo's work. If the -factor coefficient is set too large, there will be too much sensitivity to just a few, recent events, in terms of a large number of points exchanged in each game. And if the K-value is too low, the sensitivity will be minimal, and the system will not respond quickly enough to changes in a player's actual level of performance. Elo's original -factor estimation was made without the benefit of huge databases and statistical evidence. Sonas indicates that a -factor of 24 (for players rated above 2400) may be both more accurate as a predictive tool of future performance and be more sensitive to performance. Certain Internet chess sites seem to avoid a three-level K-factor staggering based on rating range. For example, the ICC seems to adopt a global except when playing against provisionally rated players. The USCF (which makes use of a
logistic distribution as opposed to a
normal distribution) formerly staggered the K-factor according to three main rating ranges: : Currently, the USCF uses a formula that calculates the -factor based on factors including the number of games played and the player's rating. The K-factor is also reduced for high rated players if the event has shorter time controls. FIDE uses the following ranges: : FIDE used the following ranges before July 2014: : The gradation of the -factor reduces rating change at the top end of the rating range, reducing the possibility for rapid rise or fall of rating for those with a rating high enough to reach a low -factor. In theory, this might apply equally to online chess players and over-the-board players, since it is more difficult for all players to raise their rating after their rating has become high and their -factor consequently reduced; however, when playing online, 2800+ players can more easily raise their rating by simply selecting opponents with high ratings – on the ICC playing site, a
grandmaster may play a string of different opponents who are all rated over 2700. In over-the-board events, it would only be in very high level all-play-all events that a player would be able to engage that number of 2700+ opponents. In a normal, open, Swiss-paired chess tournament, frequently there would be many opponents rated less than 2500, reducing the ratings gains possible from a single contest for a high-rated player.
Formal derivation for win/loss games The above expressions can be now formally derived by exploiting the link between the Elo rating and the stochastic gradient update in the logistic regression. If we assume that the game results are
binary, that is, only a win or a loss can be observed, the problem can be addressed via
logistic regression, where the games results are
dependent variables, the players' ratings are
independent variables, and the model relating both is probabilistic: the probability of the player \mathsf{A} winning the game is modeled as : \Pr\{\mathsf{A}~\textrm{wins}\} = \sigma(r_{\mathsf{A,B}}), \quad \sigma(r)=\frac 1 {1 + 10^{-r/s}}, where : r_{\mathsf{A,B}} = (R_\mathsf{A} - R_\mathsf{B}) denotes the difference of the players' ratings, and we use a scaling factor s=400, and, by
law of total probability : \Pr\{\mathsf{B}~\textrm{wins}\} = 1-\sigma(r_{\mathsf{A,B}})=\sigma(-r_{\mathsf{A,B}}). The
log loss is then calculated as : \ell = \begin{cases} -\log \sigma(r_\mathsf{A,B}) & \textrm{if}~ \mathsf{A}~\textrm{wins},\\ -\log \sigma(-r_\mathsf{A,B}) & \textrm{if}~ \mathsf{B}~\textrm{wins}, \end{cases} and, using the
stochastic gradient descent the log loss is minimized as follows: : R_{\mathsf{A}}\leftarrow R_{\mathsf{A}} - \eta \frac{\textrm{d}\ell}{\textrm{d} R_{\mathsf{A}}}, : R_{\mathsf{B}}\leftarrow R_{\mathsf{B}} - \eta \frac{\textrm{d}\ell}{\textrm{d} R_{\mathsf{B}}}. where \eta is the adaptation step. Since \frac{\textrm{d}}{\textrm{d} r}\log\sigma(r)=\frac{\log 10}{s}\sigma(-r), \frac{\textrm{d} r_{\mathsf{A,B}}}{\textrm{d} R_{\mathsf{A}}}=1, and \frac{\textrm{d} r_{\mathsf{A,B}}}{\textrm{d} R_{\mathsf{B}}}=-1, the adaptation is then written as follows : R_{\mathsf{A}}\leftarrow \begin{cases} R_{\mathsf{A}} + K \sigma(-r_{\mathsf{A,B}}) & \textrm{if}~\mathsf{A}~\textrm{wins}\\ R_{\mathsf{A}} - K \sigma(r_{\mathsf{A,B}}) & \textrm{if}~\mathsf{B}~\textrm{wins}, \end{cases} which may be compactly written as : R_{\mathsf{A}}\leftarrow R_{\mathsf{A}} + K (S_{\mathsf{A}}-E_{\mathsf{A}}) where K=\eta\log10/s is the new adaptation step which absorbs \eta and s, S_{\mathsf{A}}=1 if \mathsf{A} wins and S_{\mathsf{A}}=0 if \mathsf{B} wins, and the expected score is given by E_{\mathsf{A}}=\sigma(r_{\mathsf{A,B}}). Analogously, the update for the rating R_{\mathsf{B}} is : R_{\mathsf{B}}\leftarrow R_{\mathsf{B}} + K (S_{\mathsf{B}}-E_{\mathsf{B}}).
Formal derivation for win/draw/loss games Since the very beginning, the Elo rating has been also used in chess where we observe wins, losses or draws and, to deal with the latter a fractional score value, S_{\mathsf{A}}=0.5, is introduced. We note, however, that the scores S_{\mathsf{A}}=1 and S_{\mathsf{A}}=0 are merely indicators to the events when the player \mathsf{A} wins or loses the game. It is, therefore, not immediately clear what is the meaning of the fractional score. Moreover, since we do not specify explicitly the model relating the rating values R_{\mathsf{A}} and R_{\mathsf{B}} to the probability of the game outcome, we cannot say what the probability of the win, the loss, or the draw is. To address these difficulties, and to derive the Elo rating in the ternary games, we will define the explicit probabilistic model of the outcomes. Next, we will minimize the log loss via stochastic gradient. Since the loss, the draw, and the win are
ordinal variables, we should adopt the model which takes their ordinal nature into account, and we use the so-called adjacent categories model which may be traced to the Davidson's work : \Pr\{\mathsf{A}~\textrm{wins}\} = \sigma(r_{\mathsf{A,B}}; \kappa), : \Pr\{\mathsf{B}~\textrm{wins}\} = \sigma(-r_{\mathsf{A,B}}; \kappa), : \Pr\{\mathsf{A}~\textrm{draws}\} = \kappa\sqrt{\sigma(r_{\mathsf{A,B}}; \kappa)\sigma(-r_{\mathsf{A,B}}; \kappa)}, where : \sigma(r; \kappa) = \frac{10^{r/s}}{10^{-r/s}+\kappa + 10^{r/s}} and \kappa\ge 0 is a parameter. Introduction of a free parameter should not be surprising as we have three possible outcomes and thus, an additional degree of freedom should appear in the model. In particular, with \kappa=0 we recover the model underlying the logistic regression : \Pr\{\mathsf{A}~\textrm{wins}\} = \sigma(r_{\mathsf{A,B}};0)=\frac{10^{r_{\mathsf{A,B}}/s}}{10^{-r_{\mathsf{A,B}}/s}+ 10^{r_{\mathsf{A,B}}/s}}=\frac{1}{1+ 10^{-r_{\mathsf{A,B}}/s'}}, where s' = s/2. Using the ordinal model defined above, the
log loss is now calculated as : \ell = \begin{cases} -\log \sigma(r_{\mathsf{A,B}};\kappa) & \textrm{if}~ \mathsf{A}~\textrm{wins},\\ -\log \sigma(-r_{\mathsf{A,B}};\kappa) & \textrm{if}~ \mathsf{B}~\textrm{wins},\\ -\log \kappa -\frac{1}{2}\log\sigma(r_{\mathsf{A,B}};\kappa) - \frac{1}{2}\log\sigma(-r_{\mathsf{A,B}};\kappa) & \textrm{if}~ \mathsf{A}~\textrm{draw}, \end{cases} which may be compactly written as : \ell = -(S_{\mathsf{A}} +\frac{1}{2}D)\log \sigma(r_{\mathsf{A,B}};\kappa) -(S_{\mathsf{B}} +\frac{1}{2}D) \log \sigma(-r_{\mathsf{A,B}};\kappa) -D\log \kappa where S_{\mathsf{A}}=1
iff \mathsf{A} wins, S_{\mathsf{B}}=1 iff \mathsf{B} wins, and D=1 iff \mathsf{A} draws. As before, we need the derivative of \log\sigma(r;\kappa) which is given by : \frac{\textrm{d}}{\textrm{d} r}\log\sigma(r; \kappa) =\frac{2\log 10}{s} [1-g(r;\kappa)] , where : g(r;\kappa)= \frac{10^{r/s}+\kappa/2 } {10^{-r/s}+\kappa + 10^{r/s}}. Thus, the derivative of the log loss with respect to the rating R_{\mathsf{A}} is given by : \begin{align} \frac{\textrm{d}}{\textrm{d} R_{\mathsf{A}}}\ell &= -\frac{2\log 10}{s} \left( (S_{\mathsf{A}} +0.5D)[1-g(r_{\mathsf{A,B}};\kappa)] -(S_{\mathsf{B}} +0.5D)g(r_{\mathsf{A,B}};\kappa) \right)\\ &= -\frac{2\log 10}{s} \left(S_{\mathsf{A}} + 0.5D-g(r_{\mathsf{A,B}};\kappa)\right), \end{align} where we used the relationships S_{\mathsf{A}} + S_{\mathsf{B}} + D=1 and g(-r;\kappa)=1-g(r;\kappa) . Then, the stochastic gradient descent applied to minimize the log loss yields the following update for the rating R_{\mathsf{A}} : R_{\mathsf{A}}\leftarrow R_{\mathsf{A}} + K (\hat{S}_{\mathsf{A}}- g(r_{\mathsf{A,B}};\kappa)) where K=2\eta\log10/s and \hat{S}_{\mathsf{A}}= S_{\mathsf{A}} + 0.5D . Of course, \hat{S}_{\mathsf{A}}= 1 if \textsf{A} wins, \hat{S}_{\mathsf{A}}= 0.5 if \textsf{A} draws, and \hat{S}_{\mathsf{A}}= 0 if \textsf{A} loses. To recognize the origin in the model proposed by Davidson, this update is called an Elo-Davidson rating. The update for R_{\mathsf{B}} is derived in the same manner as : R_{\mathsf{B}}\leftarrow R_{\mathsf{B}} + K (\hat{S}_{\mathsf{B}}- g(r_{\mathsf{B,A}};\kappa)) , where r_{\mathsf{B,A}}=R_{\mathsf{B}}-R_{\mathsf{A}}=-r_{\mathsf{A,B}} . We note that : \begin{align} E[\hat{S}_{\mathsf{A}}] &=\Pr\{\mathsf{A}~\text{wins}\}+0.5\Pr\{\mathsf{A}~\text{draws}\}\\ &=\sigma(r_{\mathsf{A,B}};\kappa)+0.5\kappa\sqrt{\sigma(r_{\mathsf{A,B}};\kappa)\sigma(-r_{\mathsf{A,B}};\kappa)}\\ &=g(r_{\mathsf{A,B}};\kappa) \end{align} and thus, we obtain the rating update may be written as : R_{\mathsf{A}}\leftarrow R_{\mathsf{A}} + K (\hat{S}_{\mathsf{A}}- E_{\mathsf{A}}) , where E_{\mathsf{A}}=E[\hat{S}_\mathsf{A}] and we obtained practically the same equation as in the Elo rating except that the expected score is given by E_{\mathsf{A}}=g(r_{\mathsf{A,B}};\kappa) instead of E_{\mathsf{A}}=\sigma(r_{\mathsf{A,B}}) . As noted above, for \kappa=0, we have g(r;0) = \sigma(r) and thus the Elo-Davidson rating is exactly the same as the Elo rating; however, this is of no help to understand the case when the draws are observed (we cannot use \kappa=0 which would mean that the probability of draw is null). On the other hand, if we use \kappa=2 , we have : g(r;2)= \frac{10^{r/s}+1 } {10^{-r/s}+2 + 10^{r/s}}=\frac{1} {1+10^{-r/s}}=\sigma(r) which means that, using \kappa=2 , the Elo-Davidson rating is exactly the same as the Elo rating. == Practical issues ==