A binary choice model assumes a
latent variable Un, the utility (or net benefit) that person
n obtains from taking an action (as opposed to not taking the action). The utility the person obtains from taking the action depends on the characteristics of the person, some of which are observed by the researcher and some are not: : U_n = \boldsymbol\beta \cdot \mathbf{s_n} + \varepsilon_n where \boldsymbol\beta is a set of
regression coefficients and \mathbf{s_n} is a set of
independent variables (also known as "features") describing person
n, which may be either discrete "
dummy variables" or regular continuous variables. \varepsilon_n is a
random variable specifying "noise" or "error" in the prediction, assumed to be distributed according to some distribution. Normally, if there is a mean or variance parameter in the distribution, it cannot be
identified, so the parameters are set to convenient values — by convention usually mean 0, variance 1. The person takes the action, , if
Un > 0. The unobserved term,
εn, is assumed to have a
logistic distribution. The specification is written succinctly as: • • Y_n = \begin{cases} 1, & \text{if } U_n > 0, \\ 0, & \text{if } U_n \le 0 \end{cases} •
logistic, standard
normal, etc. Let us write it slightly differently: • • Y_n = \begin{cases} 1, & \text{if } U_n > 0, \\ 0, & \text{if } U_n \le 0 \end{cases} •
logistic, standard
normal, etc. Here we have made the substitution
en = −
εn. This changes a random variable into a slightly different one, defined over a negated domain. As it happens, the error distributions we usually consider (e.g.
logistic distribution, standard
normal distribution, standard
Student's t-distribution, etc.) are symmetric about 0, and hence the distribution over
en is identical to the distribution over
εn. Denote the
cumulative distribution function (CDF) of e as F_e, and the
quantile function (inverse CDF) of e as F^{-1}_e . Note that :: \begin{align} \Pr(Y_n=1) &= \Pr(U_n > 0) \\[6pt] &= \Pr(\boldsymbol\beta \cdot \mathbf{s_n} - e_n > 0) \\[6pt] &= \Pr(-e_n > -\boldsymbol\beta \cdot \mathbf{s_n}) \\[6pt] &= \Pr(e_n \le \boldsymbol\beta \cdot \mathbf{s_n}) \\[6pt] &= F_e(\boldsymbol\beta \cdot \mathbf{s_n}) \end{align} Since Y_n is a
Bernoulli trial, where \mathbb{E}[Y_n] = \Pr(Y_n = 1), we have :\mathbb{E}[Y_n] = F_e(\boldsymbol\beta \cdot \mathbf{s_n}) or equivalently :F^{-1}_e(\mathbb{E}[Y_n]) = \boldsymbol\beta \cdot \mathbf{s_n} . Note that this is exactly equivalent to the binomial regression model expressed in the formalism of the
generalized linear model. If e_n \sim \mathcal{N}(0,1), i.e. distributed as a
standard normal distribution, then :\Phi^{-1}(\mathbb{E}[Y_n]) = \boldsymbol\beta \cdot \mathbf{s_n} which is exactly a
probit model. If e_n \sim \operatorname{Logistic}(0,1), i.e. distributed as a standard
logistic distribution with mean 0 and
scale parameter 1, then the corresponding
quantile function is the
logit function, and :\operatorname{logit}(\mathbb{E}[Y_n]) = \boldsymbol\beta \cdot \mathbf{s_n} which is exactly a
logit model. Note that the two different formalisms —
generalized linear models (GLM's) and
discrete choice models — are equivalent in the case of simple binary choice models, but can be extended if differing ways: • GLM's can easily handle arbitrarily distributed
response variables (
dependent variables), not just
categorical variables or
ordinal variables, which discrete choice models are limited to by their nature. GLM's are also not limited to link functions that are
quantile functions of some distribution, unlike the use of an
error variable, which must by assumption have a
probability distribution. • On the other hand, because discrete choice models are described as types of
generative models, it is conceptually easier to extend them to complicated situations with multiple, possibly correlated, choices for each person, or other variations. == Latent variable interpretation / derivation ==