Bradley–Terry model

The Bradley–Terry model is a probability model for the outcome of pairwise comparisons between items, teams, or objects. Given a pair of items i and j drawn from some population, it estimates the probability that the pairwise comparison i > j turns out true, as

History and applications

The model is named after Ralph A. Bradley and Milton E. Terry, who presented it in 1952, although it had already been studied by Ernst Zermelo in the 1920s. Applications of the model include the ranking of competitors in sports, chess, and other competitions, the ranking of products in paired comparison surveys of consumer choice, analysis of dominance hierarchies within animal and human communities, ranking of journals, ranking of AI models, and is foundational to the field of training reward models in reinforcement learning from human feedback. It also plays a role in the estimation of the relevance of documents in machine-learned search engines. == Definition ==

Definition

The Bradley–Terry model can be parametrized in various ways. Equation () is perhaps the most common, but there are a number of others. Bradley and Terry themselves defined exponential score functions p_i = e^{\beta_i}, so that \Pr(i > j) = \frac{e^{\beta_i}}{e^{\beta_i} + e^{\beta_j}} = \frac{1}{1 + e^{\beta_j-\beta_i}}. Alternatively, one can use a logit, such that \operatorname{logit} \Pr(i > j) = \log \frac{\Pr(i > j)}{1 - \Pr(i > j)} = \log \frac{\Pr(i > j)}{\Pr(j > i)} = \beta_i - \beta_j, where {{tmath|1=\operatorname{logit} p = \log\frac p{1-p} }} for . This formulation highlights the similarity between the Bradley–Terry model and logistic regression. Both employ essentially the same model but in different ways. In logistic regression one typically knows the parameters \beta_i and attempts to infer the functional form of \Pr(i>j); in ranking under the Bradley–Terry model one knows the functional form and attempts to infer the parameters. With a scale factor of 400 and a base of 10, this is equivalent to the Elo rating system for players with Elo ratings and . \Pr(i > j) = \frac{10^{R_i/400}}{10^{R_i/400} + 10^{R_j/400}} = \frac{1}{1 + 10^{(R_j-R_i)/400}}. == Plackett–Luce model ==

Plackett–Luce model

A standard generalization of the BT model is the Plackett–Luce model, which models ranking N items. In the same notation as BT model: \Pr(y_1 > \cdots > y_N) = \prod_{i=1}^N \frac{p_{y_i}}{\sum_{k=i}^N p_{y_k}} = \frac{p_{y_1}}{p_{y_1} + \dots + p_{y_N}}\frac{p_{y_2}}{p_{y_2} + \cdots + p_{y_N}} \cdots \frac{p_{y_N}}{p_{y_N}} The factor with i=N is always just unity, so for N=2 this reduces to \Pr(y_1 > y_2) = p_{y_1}/(p_{y_1} + p_{y_2}). This can be imagined as drawing from an urn with replacement. The urn contains balls colored in proportion to p_1, p_2, \dots, p_N, and one draws from the urn with replacement. If a ball has a new color, then that ball is placed as the next-ranked ball. Otherwise, if the ball has a color already drawn, then it is discarded. Given the proportions p_1, p_2, \dots, p_N, the PL model can be sampled by the "exponential race" method. One samples "radioactive decay times" from N "exponential clocks", that is, t_1 \sim \mathrm{Exp}(p_1), \dots, t_N \sim \mathrm{Exp}(p_N). Then one ranks the items according to the order in which they decayed. In this interpretation, it is immediately clear that the PL model satisfies Luce's choice axiom (from the same Luce). Therefore, for any two y, z, \Pr(y > z) = \frac{p_y}{p_y + p_z} reduces to the BT model, and in general, for any subset y_1, \dots, y_M of the choices, \Pr(y_1 > \cdots > y_N) = \frac{p_{y_1}}{p_{y_1} + \cdots + p_{y_M}}\frac{p_{y_2}}{p_{y_2} + \cdots + p_{y_M}} \cdots \frac{p_{y_M}}{p_{y_M}} reduces to a smaller PL model with the same parameters. == Inference ==

Inference

The most common application of the Bradley–Terry model is to infer the values of the parameters p_i given an observed set of outcomes i>j, such as wins and losses in a competition. The simplest way to estimate the parameters is by maximum likelihood estimation, i.e., by maximizing the likelihood of the observed outcomes given the model and parameter values. Suppose we know the outcomes of a set of pairwise competitions between a certain group of individuals, and let be the number of times individual beats individual . Then the likelihood of this set of outcomes within the Bradley–Terry model is \prod_{ij} [\Pr(i>j)]^{w_{ij}} and the log-likelihood of the parameter vector is It is, however, slow to converge. More recently it has been pointed out that equation () can also be rearranged as p_i = \frac{\sum_{j} w_{ij} p_j/(p_i+p_j)}{\sum_{j} w_{ji}/(p_i+p_j)}, which can be solved by iterating {{NumBlk|:|p_i' = \frac{\sum_{j} w_{ij} p_j/(p_i+p_j)}{\sum_j w_{ji}/(p_i+p_j)},|}} again normalizing after every round of updates using equation (). This iteration gives identical results to the one in () but converges much faster and hence is normally preferred over (). Worked example of solution procedure Consider a sporting competition between four teams, who play a total of 22 games among themselves. Each team's wins are given in the rows of the table below and the opponents are given as the columns: For example, Team A has beat Team B twice and lost to team B three times; not played team C at all; won once and lost four times against team D. We would like to estimate the relative strengths of the teams, which we do by calculating the parameters p_i, with higher parameters indicating greater prowess. To do this, we initialize the four entries in the parameter vector arbitrarily, for example assigning the value 1 to each team: . Then we apply equation () to update p_1, which gives p_1 = \frac{\sum_{j(\ne 1)} w_{1j} p_j/(p_1+p_j)}{\sum_{j(\ne 1)} w_{j1}/(p_1+p_j)} = \frac{2\frac{1}{1+1} + 0\frac{1}{1+1} + 1\frac{1}{1+1}}{3\frac{1}{1+1}+0\frac{1}{1+1} + 4\frac{1}{1+1}} = 0.429. Now, we apply () again to update p_2, making sure to use the new value of p_1 that we just calculated: p_2 = \frac{\sum_{j(\ne 2)} w_{2j} p_j/(p_2+p_j)}{\sum_{j(\ne 2)} w_{j2}/(p_2+p_j)} = \frac{3\frac{0.429}{1+0.429} + 5\frac{1}{1+1} + 0\frac{1}{1+1}}{2\frac{1}{1+0.429}+3\frac{1}{1+1} + 0\frac{1}{1+1}} = 1.172 Similarly for p_3 and p_4 we get p_3 = \frac{\sum_{j(\ne 3)} w_{3j} p_j/(p_3+p_j)}{\sum_{j(\ne 3)} w_{j3}/(p_3+p_j)} = \frac{0\frac{0.429}{1+0.429} + 3\frac{1.172}{1+1.172} + 1\frac{1}{1+1}}{0\frac{1}{1+0.429}+5\frac{1}{1+1.172} + 3\frac{1}{1+1}} = 0.557 p_4 = \frac{\sum_{j(\ne 4)} w_{4j} p_j/(p_4+p_j)}{\sum_{j(\ne 4)} w_{j4}/(p_4+p_j)} = \frac{4\frac{0.429}{1+0.429} + 0\frac{1.172}{1+1.172} + 3\frac{0.557}{1+0.557}}{1\frac{1}{1+0.429}+0\frac{1}{1+1.172} + 1\frac{1}{1+0.557}} = 1.694 Then we normalize all the parameters by dividing by their geometric mean (0.429\times1.172\times0.557\times1.694)^{1/4} = 0.830 to get the estimated parameters . To improve the estimates further, we repeat the process, using the new values. For example, p_1 = \frac{2\cdot\frac{1.413}{0.516+1.413} + 0\cdot\frac{0.672}{0.516+0.672} + 1\cdot\frac{2.041}{0.516+2.041}}{3\cdot\frac{1}{0.516+1.413}+0\cdot\frac{1}{0.516+0.672} + 4\cdot\frac{1}{0.516+2.041}} = 0.725. Repeating this process for the remaining parameters and normalizing, we get . Repeating a further 10 times gives rapid convergence toward a final solution of . This indicates that Team D is the strongest and Team B the second strongest, while Teams A and C are nearly equal in strength but below Teams B and D. In this way the Bradley–Terry model lets us infer the relationship between all four teams, even though not all teams have played each other. == Variations ==

Variations

Crowd-BT The Crowd-BT model, developed in 2013 by Chen et al, attempts to extend the standard Bradley–Terry model for crowdsourced settings while reducing the number of comparisons needed by taking into account the reliability of each judge. In particular, it identifies and excludes judges presumed to be spammers (selecting choices at random) or malicious (selecting always the wrong choice). In a crowdsourced task of ranking documents by reading difficulty with 624 judges contributing up to 40 pairwise comparisons each, Crowd-BT was shown to outperform both standard Bradley–Terry as well as ranking system TrueSkill. It has been recommended for use when quality results are valued over efficiency and the number of comparisons is high. == See also ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com