The unit deviance d(y,\mu) is a bivariate function that satisfies the following conditions: • d(y,y) = 0 • d(y,\mu) > 0 \quad\forall y \neq \mu The total deviance D(\mathbf{y},\hat{\boldsymbol{\mu}}) of a model with predictions \hat{\boldsymbol{\mu}} of the observation \mathbf{y} is the sum of its unit deviances: D(\mathbf{y},\hat{\boldsymbol{\mu}}) = \sum_i d(y_i, \hat{\mu}_i). The (total) deviance for a model
M0 with estimates \hat{\mu} = E[Y|\hat{\theta}_0], based on a dataset
y, may be constructed by its likelihood as: D(y,\hat{\mu}) = 2 \left(\log \left[p(y\mid\hat \theta_s)\right] - \log \left[ p(y\mid\hat \theta_0)\right]\right). Here \hat \theta_0 denotes the fitted values of the parameters in the model
M0, while \hat \theta_s denotes the fitted parameters for the
saturated model: both sets of fitted values are implicitly functions of the observations
y. Here, the
saturated model is a model with a parameter for every observation so that the data are fitted exactly. This expression is simply 2 times the
log-likelihood ratio of the full model compared to the reduced model. The deviance is used to compare two models – in particular in the case of
generalized linear models (GLM) where it has a similar role to residual sum of squares from
ANOVA in linear models (
RSS). Suppose in the framework of the GLM, we have two
nested models,
M1 and
M2. In particular, suppose that
M1 contains the parameters in
M2, and
k additional parameters. Then, under the null hypothesis that
M2 is the true model, the difference between the deviances for the two models follows, based on
Wilks' theorem, an approximate
chi-squared distribution with
k-degrees of freedom. : "the quantity -2 \log \big[ p(y\mid\hat \theta_0)\big] is sometimes referred to as a
deviance. This is [...] inappropriate, since unlike the deviance used in the context of generalized linear modelling, -2 \log \big[ p(y\mid\hat \theta_0)\big] does not measure deviation from a model that is a perfect fit to the data." However, since the principal use is in the form of the difference of the deviances of two models, this confusion in definition is usually unimportant. ==Examples==