The goal of log-linear analysis is to determine which model components are necessary to retain in order to best account for the data. Model components are the number of
main effects and
interactions in the model. For example, if we examine the relationship between three variables—variable A, variable B, and variable C—there are seven model components in the saturated model. The three main effects (A, B, C), the three two-way interactions (AB, AC, BC), and the one three-way interaction (ABC) gives the seven model components. The log-linear models can be thought of to be on a continuum with the two extremes being the simplest model and the
saturated model. The simplest model is the model where all the expected frequencies are equal. This is true when the variables are not related. The saturated model is the model that includes all the model components. This model will always explain the data the best, but it is the least parsimonious as everything is included. In this model, observed frequencies equal expected frequencies, therefore in the likelihood ratio chi-square statistic, the ratio \frac{O_{ij}}{E_{ij}}=1 and \ln(1)=0. This results in the likelihood ratio chi-square statistic being equal to 0, which is the best model fit. Other possible models are the conditional equiprobability model and the mutual dependence model. Each log-linear model can be represented as a log-linear equation. For example, with the three variables (
A,
B,
C) the saturated model has the following log-linear equation: :\ln(F_{ijk})=\lambda + \lambda_i^A + \lambda_j^B +\lambda_k^C + \lambda_{ij}^{AB} + \lambda_{ik}^{AC}+ \lambda_{jk}^{BC} + \lambda_{ijk}^{ABC}, \, where :F_{ijk} = expected frequency in cell
ijk; :\lambda = the relative weight of each variable.
Hierarchical model Log-linear analysis models can be hierarchical or nonhierarchical. Hierarchical models are the most common. These models contain all the lower order interactions and main effects of the interaction to be examined.
Graphical model A log-linear model is graphical if, whenever the model contains all two-factor terms generated by a higher-order interaction, the model also contains the higher-order interaction. As a direct-consequence, graphical models are hierarchical. Moreover, being completely determined by its two-factor terms, a
graphical model can be represented by an
undirected graph, where the vertices represent the variables and the edges represent the two-factor terms included in the model.
Decomposable model A log-linear model is decomposable if it is graphical and if the corresponding graph is
chordal.
Model fit The model fits well when the
residuals (i.e., observed-expected) are close to 0, that is the closer the observed frequencies are to the expected frequencies the better the model fit. If the likelihood ratio chi-square statistic is non-significant, then the model fits well (i.e., calculated expected frequencies are close to observed frequencies). If the likelihood ratio chi-square statistic is significant, then the model does not fit well (i.e., calculated expected frequencies are not close to observed frequencies).
Backward elimination is used to determine which of the model components are necessary to retain in order to best account for the data. Log-linear analysis starts with the saturated model and the highest order interactions are removed until the model no longer accurately fits the data. Specifically, at each stage, after the removal of the highest ordered interaction, the likelihood ratio chi-square statistic is computed to measure how well the model is fitting the data. The highest ordered interactions are no longer removed when the likelihood ratio chi-square statistic becomes significant.
Comparing models When two models are
nested, models can also be compared using a chi-square difference test. The chi-square difference test is computed by subtracting the likelihood ratio chi-square statistics for the two models being compared. This value is then compared to the chi-square critical value at their difference in degrees of freedom. If the chi-square difference is smaller than the chi-square critical value, the new model fits the data significantly better and is the preferred model. Else, if the chi-square difference is larger than the critical value, the less parsimonious model is preferred. == Follow-up tests ==