Paired difference tests for reducing variance are a specific type of
blocking. To illustrate the idea, suppose we are assessing the performance of a drug for treating high cholesterol. Under the design of our study, we enroll 100 subjects, and measure each subject's cholesterol level. Then all the subjects are treated with the drug for six months, after which their cholesterol levels are measured again. Our interest is in whether the drug has any effect on mean cholesterol levels, which can be inferred through a comparison of the post-treatment to pre-treatment measurements. The key issue that motivates the paired difference test is that unless the study has very strict entry criteria, it is likely that the subjects will differ substantially from each other before the treatment begins. Important baseline differences among the subjects may be due to their gender, age, smoking status, activity level, and diet. There are two natural approaches to analyzing these data: • In an "unpaired analysis", the data are treated as if the study design had actually been to enroll 200 subjects, followed by
random assignment of 100 subjects to each of the treatment and control groups. The treatment group in the unpaired design would be viewed as analogous to the post-treatment measurements in the paired design, and the control group would be viewed as analogous to the pre-treatment measurements. We could then calculate the sample means within the treated and untreated groups of subjects, and compare these means to each other. • In a "paired difference analysis", we would first subtract the pre-treatment value from the post-treatment value for each subject, then compare these differences to zero. If we only consider the means, the paired and unpaired approaches give the same result. To see this, let be the observed data for the pair, and let . Also let , and denote, respectively, the
sample means of the , the , and the . By rearranging terms we can see that : \bar{D} = \frac{1}{n}\sum_i (Y_{i2}-Y_{i1}) = \frac{1}{n}\sum_iY_{i2} - \frac{1}{n}\sum_iY_{i1} = \bar{Y}_2 - \bar{Y}_1, where
n is the number of pairs. Thus the mean difference between the groups does not depend on whether we organize the data as pairs. Although the mean difference is the same for the paired and unpaired statistics, their statistical significance levels can be very different, because it is easy to overstate the
variance of the unpaired statistic. Through
Bienaymé's identity, the variance of is : \begin{align} {\rm var}(\bar{D}) &= \operatorname{var}(\bar{Y}_2-\bar{Y}_1)\\ &= \operatorname{var}(\bar{Y}_2) + \operatorname{var}(\bar{Y}_1) - 2\operatorname{cov}(\bar{Y}_1,\bar{Y}_2)\\ &= \sigma_1^2/n + \sigma_2^2/n - 2\sigma_1\sigma_2\operatorname{corr}(Y_{i1}, Y_{i2})/n, \end{align} where and are the population standard deviations of the and data, respectively. Thus the variance of is lower if there is positive
correlation within each pair. Such correlation is very common in the repeated measures setting, since many factors influencing the value being compared are unaffected by the treatment. For example, if cholesterol levels are associated with age, the effect of age will lead to positive correlations between the cholesterol levels measured within subjects, as long as the duration of the study is small relative to the variation in ages in the sample.
Power of the paired Z-test Suppose we are using a
Z-test to analyze the data, where the variances of the pre-treatment and post-treatment data and are known (the situation with a
t-test is similar). The unpaired Z-test statistic is : \frac{\bar{Y}_2 - \bar{Y}_1}{\sqrt{\sigma_1^2/n + \sigma_2^2/n}}, The power of the unpaired,
one-sided test carried out at level can be calculated as follows: : \begin{align} P\left(\frac{\bar{Y}_2 - \bar{Y}_1}{\sqrt{\sigma_1^2/n + \sigma_2^2/n}} > 1.645\right) &= P\left(\frac{\bar{Y}_2 - \bar{Y}_1}{S} > 1.645\sqrt{\sigma_1^2/n + \sigma_2^2/n}/S\right)\\ &= P\left(\frac{\bar{Y}_2 - \bar{Y}_1-\delta+\delta}{S} > 1.645\sqrt{\sigma_1^2/n + \sigma_2^2/n}/S\right)\\ &= P\left(\frac{\bar{Y}_2 - \bar{Y}_1-\delta}{S} > 1.645\sqrt{\sigma_1^2/n + \sigma_2^2/n}/S - \delta/S\right)\\ &= 1 - \Phi(1.645\sqrt{\sigma_1^2/n + \sigma_2^2/n}/S - \delta/S), \end{align} where
S is the standard deviation of
D, Φ is the standard
normal cumulative distribution function, and
δ = E
Y2 − E
Y1 is the true effect of the treatment. The constant 1.645 is the 95th percentile of the standard normal distribution, which defines the rejection region of the test. By a similar calculation, the power of the paired Z-test is : 1 - \Phi(1.645 - \delta/S). By comparing the expressions for power of the paired and unpaired tests, one can see that the paired test has more power as long as : \sqrt{\sigma_1^2/n + \sigma_2^2/n}/S = \sqrt{\frac{\sigma_1^2+\sigma_2^2}{\sigma_1^2+\sigma_2^2-2\sigma_1\sigma_2\rho}} > 1 \text{ where } \rho := \operatorname{corr}(Y_{i1},Y_{i2}). This condition is met whenever \rho, the within-pairs correlation, is positive.
A random effects model for paired testing The following
statistical model is useful for understanding the paired difference test : Y_{ij} = \mu_j + \alpha_i + \varepsilon_{ij} where is a
random effect that is shared between the two values in the pair, and is a random noise term that is independent across all data points. The constant values are the
expected values of the two measurements being compared, and our interest is in . In this model, the capture "stable confounders" that have the same effect on the pre-treatment and post-treatment measurements. When we subtract to form cancel out, so do not contribute to the variance. The within-pairs covariance is : \operatorname{cov}(Y_{i1}, Y_{i2}) = \operatorname{var}(\alpha_i). This is non-negative, so it leads to better performance for the paired difference test compared to the unpaired test, unless the are constant over , in which case the paired and unpaired tests are equivalent. In less mathematical terms, the unpaired test assumes that the data in the two groups being compared are independent. This assumption determines the form for the variance of . However, when two measurements are made for each subject, it is unlikely that the two measurements are independent. If the two measurements within a subject are positively correlated, the unpaired test overstates the variance of , making it a conservative test in the sense that its actual
type I error probability will be lower than the nominal level, with a corresponding loss of statistical power. In rare cases, the data may be negatively correlated within subjects, in which case the unpaired test becomes anti-conservative. The paired test is generally used when repeated measurements are made on the same subjects, since it has the correct level regardless of the correlation of the measurements within pairs. ==Use in reducing confounding==