The analysis of variance has been studied from several approaches, the most common of which uses a
linear model that relates the response to the treatments and blocks. Note that the model is linear in parameters but may be nonlinear across factor levels. Interpretation is easy when data is balanced across factors but much deeper understanding is needed for unbalanced data.
Textbook analysis using a normal distribution The analysis of variance can be presented in terms of a
linear model, which makes the following assumptions about the
probability distribution of the responses: •
Independence of observations – this is an assumption of the model that simplifies the statistical analysis. •
Normality – the distributions of the
residuals are
normal. • Equality (or "homogeneity") of variances, called
homoscedasticity—the variance of data in groups should be the same. The separate assumptions of the textbook model imply that the
errors are independently, identically, and normally distributed for fixed effects models, that is, that the errors (\varepsilon) are independent and \varepsilon \thicksim N(0, \sigma^2).
Randomization-based analysis In a
randomized controlled experiment, the treatments are randomly assigned to experimental units, following the experimental protocol. This randomization is objective and declared before the experiment is carried out. The objective random-assignment is used to test the significance of the
null hypothesis, following the ideas of
C. S. Peirce and
Ronald Fisher. This design-based analysis was discussed and developed by
Francis J. Anscombe at
Rothamsted Experimental Station and by
Oscar Kempthorne at
Iowa State University. Kempthorne and his students make an assumption of
unit treatment additivity, which is discussed in the books of Kempthorne and
David R. Cox.
Unit-treatment additivity In its simplest form, the assumption of unit-treatment additivity states that the observed response y_{i,j} from experimental unit i when receiving treatment j can be written as the sum of the unit's response y_i and the treatment-effect t_j, that is y_{i,j}=y_i+t_j. The assumption of unit-treatment additivity implies that, for every treatment j, the jth treatment has exactly the same effect t_j on every experiment unit. The assumption of unit treatment additivity usually cannot be directly
falsified, according to Cox and Kempthorne. However, many
consequences of treatment-unit additivity can be falsified. For a randomized experiment, the assumption of unit-treatment additivity
implies that the variance is constant for all treatments. Therefore, by
contraposition, a necessary condition for unit-treatment additivity is that the variance is constant. The use of unit treatment additivity and randomization is similar to the design-based inference that is standard in finite-population
survey sampling.
Derived linear model Kempthorne uses the randomization-distribution and the assumption of
unit treatment additivity to produce a
derived linear model, very similar to the textbook model discussed previously. The test statistics of this derived linear model are closely approximated by the test statistics of an appropriate normal linear model, according to approximation theorems and simulation studies. However, there are differences. For example, the randomization-based analysis results in a small but (strictly) negative correlation between the observations. In the randomization-based analysis, there is
no assumption of a
normal distribution and certainly
no assumption of
independence. On the contrary,
the observations are dependent! The randomization-based analysis has the disadvantage that its exposition involves tedious algebra and extensive time. Since the randomization-based analysis is complicated and is closely approximated by the approach using a normal linear model, most teachers emphasize the normal linear model approach. Few statisticians object to model-based analysis of balanced randomized experiments.
Statistical models for observational data However, when applied to data from non-randomized experiments or
observational studies, model-based analysis lacks the warrant of randomization. For observational data, the derivation of confidence intervals must use
subjective models, as emphasized by
Ronald Fisher and his followers. In practice, the estimates of treatment-effects from observational studies generally are often inconsistent. In practice, "statistical models" and observational data are useful for suggesting hypotheses that should be treated very cautiously by the public.
Summary of assumptions The normal-model based ANOVA analysis assumes the independence, normality, and homogeneity of variances of the residuals. The randomization-based analysis assumes only the homogeneity of the variances of the residuals (as a consequence of unit-treatment additivity) and uses the randomization procedure of the experiment. Both these analyses require
homoscedasticity, as an assumption for the normal-model analysis and as a consequence of randomization and additivity for the randomization-based analysis. However, studies of processes that change variances rather than means (called dispersion effects) have been successfully conducted using ANOVA. There are
no necessary assumptions for ANOVA in its full generality, but the
F-test used for ANOVA hypothesis testing has assumptions and practical limitations which are of continuing interest. Problems which do not satisfy the assumptions of ANOVA can often be transformed to satisfy the assumptions. The property of unit-treatment additivity is not invariant under a "change of scale", so statisticians often use transformations to achieve unit-treatment additivity. If the response variable is expected to follow a parametric family of probability distributions, then the statistician may specify (in the protocol for the experiment or observational study) that the responses be transformed to stabilize the variance. Also, a statistician may specify that logarithmic transforms be applied to the responses which are believed to follow a multiplicative model. According to
Cauchy's functional equation theorem, the
logarithm is the only continuous transformation that relates multiplication operations to addition operations over the real numbers. ==Characteristics==